The effects of captioning texts and caption ordering on L2 listening comprehension and vocabulary learning


Shahrekord University, Shahrekord, Islamic Republic of Iran


This study investigated the effects of captioned texts on second/foreign (L2) listening comprehension and vocabulary gains using a computer multimedia program. Additionally, it explored the caption ordering effect (i.e. captions displayed during the first or second listening), and the interaction of captioning order with the L2 proficiency level of language learners in listening comprehension and vocabulary performance. To these ends, a computer software program was designed and 200 EFL learners (100 high-intermediate and 100 low-intermediate level students) were asked to participate in the experiment. They were randomly assigned into four groups: captioned (listening to texts twice with captions), noncaptioned (listening to texts twice without captions), first captioned (listening to texts first with captions and then without captions), and second captioned (listening to texts first without captions and then with captions) groups. They listened to four audio texts (i.e. short stories) twice and took the listening and vocabulary tests, administered through the software. Results from t-tests and two-way ANOVAs showed that the captioned stories were more effective than the non-captioned ones. Moreover, the caption ordering had no significant effect on the participants' L2 listening comprehension and vocabulary performance. Finally, L2 proficiency level differences did not affect performance derived from caption ordering.


Main Subjects

Computers  entered  school  life  in  the  late
1950s in developed countries and have been
developing  throughout  the  world  since  then
(Gunduz,  2005).  Initially,  they  were  mostly
brought  to  educational  settings  for  the
purpose  of  processing  and  displaying
information  and  their  applicability  to
teaching  was  not  greatly  emphasized.
However,  as  Brett  (1995,  p.  77)  states,
“increase in the speed, storage capacity and
memory  size  of  computers,  together  with
developments  in  the  sophistication  of
software,  now  enable  computers  to  deliver
video,  sound,  text  and  graphics”,    greatly
assisting the process of teaching and making
computers  part  of  most  classrooms.
Nowadays  a  large  amount  of  L2  materials
such  as  textbooks,  dictionaries,  compact
discs  (CDs),  and  videos  require  computers
and  technologies.  And  drawing  on
multimedia  software  programs,  computer
assisted  language  learning  (CALL),  an
approach  to  language  teaching  and  learning
in which computers are used as an aid to the
presentation,  reinforcement  and  assessment
of  materials  (Davies,  2002),  is  used  for
learning/teaching  language  skills.  With
CALL finding its stable floor in educational
settings,  listening  skill  (i.e.  the  ability  to
understand language which is used by native
speakers)  is  no  exception  in  making  use  of
multimedia CALL.
Traditionally,  second  language  (L2)
listening comprehension was considered as a
passive  and  receptive  skill,  meriting  little
attention,  and  listening  activities  in  L2
classrooms mostly consisted of listening to a
tape  and  repeating  after  the  teacher  or
dictation  with  a  focus  on  bottom-up
processing,  which  made  L2  classrooms
somewhat  boring  (Hayati&  Vahid,  2012).
However, with listening as an active process
in  which  listeners  attempt  to  discriminate
between  sounds,  understand  vocabulary  and
structures  within  the  context  of  the
utterance,  CALL  programs,  appropriately
selected and organized, have offered a range
of opportunities to develop L2 listening skill
and  vocabulary  learning;  the  attractive
capability  of  multimedia  CALL  in
controlling and arranging various media has
introduced  audiovisual  materials  enhanced
with captions as a potential pedagogical tool
in  helping  L2  learners  improve  their
listening  comprehension  skill  and
vocabulary  learning.    However,  the  use  of
captions  in  listening  materials  (i.e.  textual
versions of the audio dialogues displayed  at
the bottom of the screen) with L2 learners at
different  proficiency  levels  has  not  been
without  controversy  (Danan,    2004;  Pujola,
2002).  On  the  one  hand,  it  is  claimed  that
captions can promote L2 learning by helping
learners  visualize  what  they  hear,
particularly  if  the  input  is  a  little  beyond
their  linguistic  control  level  (Danan,  2004).
Besides,  visual  clues  and  soundtracks  in
captioned  listening  materials  can  create  an
authentic  culture  and  language  environment
in  which  incidental  learning  can  take  place
(Yang-dong  &Cai-fen,  2007).  Furthermore,
captions  might  be  conducive  to  language
comprehension  by  facilitating  additional
cognitive processes, such as greater depth of
oral-word  processing  (Bird  &  Williams,
2002).  On  the  other  hand,  it  is  claimed  that
captions are more of a distraction in natural
and  meaning  focused  learning  than  help  for
L2  learners,  particularly  for  those  at  low
levels  (Taylor,  2005).  It  is  believed  that
"misuse"  of  captions  in  listening  can
potentially  prevent  the  development  of
listening  strategies  (Pujola,    2002,    p.  252).  
Creating  a  gap  in  L2  research,  the  above
issues  are  motivating  enough  for  us  to
explore  the  impacts  of  captioning  and  order
of  its  presentation  on  L2  listening
comprehension  and  vocabulary  gains  across
two  different  L2  (i.e.  English)  proficiency
levels  through  a  computer  multimedia
software  with  the  hope  of  helping  L2
teachers  and  material  developers  in  the
development  of  more  effective  computer-based listening activities. This objective can
achieve  more  significance  in  the  English  as
a  foreign  language  (EFL)  context  of  Iran
where  not  much  attempt  has  been  made  to
develop  computer  programs  in  spite  of  the
potential  of  recent  computer  technology  in
facilitating L2 learning.
Review of literature
The arrival of personal computers in the late
1970s  resulted  in  an  increase  in  the
development  of  Computer  Assisted
Language  Instruction  (CALI).  With  the  use
of  computers  in  language  education,
gradually  CALI  changed  into  CALL,  the
expression  chosen  at  the  1983  TESOL
convention  in  Toronto  (Tuncok,  2010).

Since  1980s,  CALL  has  continued  its
progress  and,  for  the  last  decade  or  so,  a
number  of  studies  (e.g.,  Cushion  &
Dominique,  2002;  deHaan,  2011;
Jayachandran, 2007) have been conducted to
identify  the  effect  of  CALL  on  L2  listening
comprehension.  Although  there  are  some
studies (e.g., Chang, 2002; Dupagna, Stacks,
&  Giroux,  2007)  which  show  the  negative
effect  of  CALL  on  L2  listening
comprehension,  most  of  the  studies  (e.g.,
Pujola, 2002; Volle, 2005) have revealed the
positive  effect  of  CALL  on  L2  learners’
For instance, Verdugo and Belmonte (2007)
examined  the  effects  that  digital  stories
might  have  on  the  understanding  of  spoken
English  by  a  group  of  Spanish  learners.
Results  showed  that  computer  and  internet-based  technology  could  improve  English
listening  comprehension.  Also,  in  the  EFL
context  of  Iran,  Khoii  and  Aghabeig  (2009)
and  Barani  (2011)  investigated  the  effect  of
using  computer  software  on  the
improvement  of  listening  comprehension  of
elementary  and  intermediate  L2  students
respectively. Results of both studies showed
that  the  use  of  computer  software  could
improve  the  students’  listening  ability,  as
compared  with  the  traditional  way  of
listening  to  tapes  and  answering  some
questions from their book.  
Captions  are  “on-screen  text  in  a  given
language combined with a soundtrack in the
same language” (Markham & Peter, 2003, p.
332). The processing of converting the audio
content  into  text  and  displaying  it  on  a
screen  or  monitor  may  be  a  bonus  in
language  learning.  Inspired  by  this  claim,
Bird  and  Williams  (2002)  examined  how  a
bimodal  presentation  (aural  and  visual)  of
novel  words  would  impact  the  learning  of
the  words.  Vocabulary  was  presented  to
advanced  learners  of  English  in  three
conditions:  (a)  text  with  sound,  (b)  text
without  sound,  and  (c)  sound  without  text.
Results  demonstrated  that  vocabulary
presented  with  text  and  sound  (i.e.
captioning) could result in better recognition
memory  for  spoken  words  when  compared
to  the  other  two  presentation  modalities.
Also,  Pujola  (2002)  studied  the  strategies
used by Spanish-speaking ESL learners who
utilized  web-based  multimedia  videos.  She
wanted  to  find  out  whether  the  learners
would  choose  to  use  captions  or  transcripts
while watching videos. She found that those
learners  with  poorer  listening  skills  used
captions more for help with comprehension.
In  addition,  the  Spanish  learners  generally
had  better  experiences  with  captions  than
with  transcripts.  Similarly,  Grgurović  and
Hegelheimer  (2007)  reported  that  students
who  used  captions  in  a  multimedia  video
environment  would  utilize  them  more
frequently  and  for  longer  periods  of  time
than  those  who  used  transcripts.  In  another
study,  Markham  and  Peter  (2003)
investigated  the  effects  of  using  Spanish
(L1) captions, English (L2) captions, and no
captions  on  L2  students’  listening
comprehension;  results  revealed  that  the
captions  groups  outperformed  the  no
captions group. Along the same lines, Taylor
(2005)  examined  whether  captioned  video
could  benefit  beginning-level  learners.  Two
groups of Spanish learners (one in their first
year  of  Spanish  and  one  with  three  or  four
years  of  Spanish)  viewed  a  video  with  or
without  Spanish  captioning.  Third-  and
fourth-year learners who watched the videos
with  captions  performed  better  than  first-year  students,  but  scores  for  those  who  did
not view captions did not differ regardless of
level.    Also,  unlike  Markham  and  Peter's
(2003)  study,  Spanish  first-year  learners  in
Taylor's  study  found  the  captions
distracting. They reported it was difficult for
them  to  attend  to  sound,  image,  and
captions.  To  strike  a  balance  between  two
sides,  Guillory  (1998)  have  reported  that
captions  are  beneficial  for  beginning-level
learners when only key  words are presented
as  captions,  rather  than  having  entire
sentences on screen as captions (i.e., the full
text of what was spoken).
Captions can be overused (Pujola, 2002), so
it may be important to see whether listening
materials  should  be  played  once  with
captions  and  once  without,  that  is,  whether
captioning  should  be  in  the  first  viewing.
Having  gone  beyond  the  comparison  of
captioned  versus  non-captioned  materials,
Winke,  Gass,  and  Sydorenko  (2010)
investigated  the  effects  of  order  of
captioning  during  video-based  listening
activities  in  Spanish  and  less-commonly
taught  languages  with  non-Latin  scripts  in
the  US  (i.e.  Arabic,  Chinese,  and  Russian).
All the participants watched video materials
twice. The findings indicated that captioning
during  the  first  showing  of  the  videos  was
more  effective  for  the  performance  on
listening  comprehension  and  vocabulary  for
Spanish  and  Russian  learners.  They  have
suggested that "learners of a language whose
orthography  is  closer  to  that  of  the  target
language  are  better  able  to  use  the  written
modality as an initial source of information"
(p. 80).  
The  above  studies  mostly  investigated  the
role  of  captioning  in  L2  listening
comprehension,  but  it  is  very  difficult  to
generalize  findings;  most  of  the  above
studies did not group subjects by proficiency
levels;  the  differences  might  be  due  to
proficiency levels or the type of materials or
tests  used  in  the  study.  Moreover,  there  has
been  very  little  empirical  research  in  EFL
contexts  about  the  role  of  captioning  and
almost none, except one (Winke et al. 2010)
about the order of captioning in L2 listening
comprehension  and  vocabulary.  It  is
important  to  know  when  EFL  learners
should  be  exposed  to  captioning  in  audio
materials  to  better  avoid  the  misuse  of
captions. None of the studies have addressed
the aforementioned issues in the Iranian EFL
context. To fill this gap, the present study is
aimed  to  investigate  the  impacts  of
captioning  and  captioning  order  on  L2  (i.e.
English)  listening  comprehension  and
vocabulary  gains  through  a  multimedia
computer  program  in  an  Iranian  EFL
context.  To  these  ends,  the  following
research questions have been developed:
1.  Do  captions  improve  L2  learners’
comprehension  of  English  texts  and
learning of English vocabulary?
2.  When  an  English  text  is  listened  to
twice,  is  captioning  more  effective
when  the  first  listening  is  with
captions  or  when  the  second  listening
is with captions?
3.  Does  English  proficiency  level
interact with captioning order to affect
L2 learners’ comprehension of English
texts  and  learning  of  English
For  the  purposes  of  this  study,  200
intermediate  EFL  learners  were  selected
nonrandomly  through  a  placement  test
(OPT)  from  a  larger  sample  of  240  EFL
learners  from  four  private  language
institutes (i.e. AvayeDanesh, HomayeZarrin,
Payam  Parsa,  PejvakeDanesh)  in
Zarrinshahr,  a  city  in  Isfahan  Province,
where they could be accessed by the present
researchers.  They  included  both  male  (n  =
82) and female (n = 118) students whose age
ranged  from  18  to  24,  with  Persian  as  their
L1.  They  consisted  of  100  high-  and  100
low-intermediate  learners  of  English.
Meanwhile,  a  prerequisite  was  that  all  the

participants  had  passed  at  least  eight  terms
in  language  institutes;  therefore,  it  was
assumed  that  these  students  were  familiar
with  multiple-choice  listening  and
vocabulary  tests  and  had  an  adequate
command  of  listening  skill  for  the  purpose
of the study.  
Instruments and materials
To  collect  data,  this  study  made  use  of
several  instruments:  The  first  instrument
was  the  Objective  Placement  Test  (OPT,
2008)  consisting  20  multiple-choice
listening, 20 multiple-choice reading, and 30
multiple-choice  language  use  items.  This
study  used  the  OPT  to  select  200
intermediate  EFL  learners  and  place  them
into  two  L2  ability  groups  (i.e.,  high  and
low).  In  the  current  study,  the  Cronbach
Alpha reliability of this test was found to be
acceptable  (0.80).  Besides,  the  correlation
coefficient between the scores obtained from
the  OPT  and  a  retired  paper-based  TOEFL
was  found  to  be  high  (see  procedures).  The
second  one  was  a  listening  comprehension
test, consisting of 24 true/false items and 32
multiple-choice  (MC)  items.  The
participants  had  to  click  on  the  choice  true
or  false  in  15  seconds  and  the  best
alternative  in  MC  in  30  seconds  after  the
audio  prompts  were  presented  to  them  (see
Figure 1).

Finally, the third one was a vocabulary test,
consisting  of  36  multiple-choice  items  with
the  key  target  vocabulary  selected  from  the
audio  texts  and  no  cognates.  Each  test  had
five choices, one of which was “I knew this
word before listening to the text” (Figure 2).
Meanwhile,  the  validity  of  the  listening
comprehension  and  vocabulary  tests  was
established  through  factor  analysis,  using
principle  component  analysis  (PCA)  on  a
group  of  100  participants.  Moreover,  the
reliability  of  the  listening  comprehension
and  vocabulary  tests  as  measured  by
Cronbach’s  alpha  in  the  current  study  was
found to be 0.81 and .0.85 respectively.

The  audio  texts  used  in  this  study  included
four  English  short  stories,  selected  on  the
basis  of  length,  conceptual  difficulty,  and
readability from the Steps to Understanding
(Hill,  1988),  presenting  audio  materials  at
the  intermediate  level.  Each  short  story,
approximately  one  minute  in  length,  had  a
single narrator telling the story.   
To  collect  the  data,  the  OPT  was  given  to
240 L2 students. Following guidelines of the
OPT  (Hansen  &  Lesley,  2005),  their  OPT
scores  were  used  to  select  200  intermediate
(i.e.,  100  high-  and  100  low-intermediate)
EFL  learners.  Moreover,  to  ensure  the
dependability of the data, 25 of the selected
participants  (12  males  and  13  females)  also
answered  a  retired  version  of  TOEFL
(2004),  and  the  correlation  between  their
OPT  and  TOEFL  scores  was  investigated
using  the  Pearson  product-moment
correlation  coefficient,  which  turned  out  to
be  high  (0.85).  Meanwhile,  the  computer
software  through  which  the  listening  and
vocabulary  tests  were  administered  was
piloted on a sample of 20 intermediate level
L2  learners  to  assess  the  appropriacy  of  the
materials, time, wordings and instruction. In
addition,  the  construct  validity  of  the  tests
was  examined  by  PCA  in  a  sample  of  100
intermediate  EFL  students.  Using  Catell’s
(1966)  scree  test,  56  listening  and  36
vocabulary  items  with  acceptable
eigenvalues  were  retained  for  the  further
data  collection.  To  assess  the  potential
impacts  of  captioning  and  order  of
captioning  on  L2  listening  comprehension
and  vocabulary,  the  selected  participants
were  then  randomly  assigned  into  four
groups,  each  with  50  EFL  learners:  the
caption  group  (CG),  noncaption  group
(NCG),  the  first  caption  group  (FCG),  and
the  second  caption  group  (SCG).  For  the
main trial,
1.  the  CG  listened  to  the  audio  short
stories  twice,  both  times  with
captioning (Figure 3);
2.  the  NCG  listened  to  the  audio  short
stories  twice,  both  times  without
3.  the  FCG  listened  to  the  audio  short
stories  twice,  first  time  with
4.  the  SCG  listened  to  the  audio  short
stories  twice,  second  time  with
After the second listening of each audio text,
the  corresponding  listening  comprehension
test  items,  followed  by  the  corresponding
vocabulary  test  items,  were  administered  to
the  participants  of  the  main  study.  Finally,
discrete-point  scoring  procedures  (i.e.  0  for
false  and  1  for  right  answers)  were  utilized
to  obtain  each  participant's  total  listening
and vocabulary scores through the software. 

Table  1  shows  the  descriptive  statistics
(mean  and  standard  deviations)  of  the  L2
(i.e.,  English)  listening  comprehension  and
vocabulary scores for the four groups across
the two proficiency levels. As demonstrated
in  the  table,  the  high-intermediate
participants  in  the  four  groups  received  a
higher mean score than the low-intermediate
ones on both listening and vocabulary items,
with  the  highest  listening  and  vocabulary
mean  scores  belonging  to  the  caption  group
(M  =  51.12,  M  =  31.48  respectively).
Moreover,  the  standard  deviations  in  the
four groups did not show great variability in
the listening and vocabulary scores.  
To answer the first research question, which
concerned  the  overall  impact  of  captioning
on  the  English  listening  comprehension  and
vocabulary  scores,  independent  t-tests  were
used,  with  the  captioning  (i.e.
captions/noncaptions)  as  the  independent
variable  and  the  listening  comprehension
and  vocabulary  scores  as  the  independent
variables in the analysis. As demonstrated in

Tables  2  and  3,  there  was  a  statistically
significant  difference  between  the  listening
mean  scores  of  the  CG  (M  =  48.84,  SD  =
2.77) and the NCG participants (M = 46.74,
SD  =  3.25)  at  the  0.01  level,  t  (98)  =  3.47,
*p  <  .01.  That  is,  the  CG  participants
outperformed  the  NCG  ones  on  the  L2
listening  comprehension  test.  Moreover,  the
eta  squared,  showing  the  magnitude  of  the
mean  difference,  was  found  to  be  moderate
(0.10).  Along  the  same  lines,  a  statistically
significant  difference  between  the
vocabulary  mean  scores  of  the  CG  (M  =
29.22, SD = 2.72) and the NCG participants
(M = 27.52, SD = 2.96) was found,  t (98) =
2.98,  *p  <  .01.  That  is,  the  CG  participants
performed better on the vocabulary test than
the  NCG  participants  did.  However,  the
magnitude  of  the  difference  in  the  means
was not large (eta squared = .083).

The focus of enquiry in  the second research
question  was  the  effect  of  captioning  order
on  the  L2  listening  comprehension  and
vocabulary  scores.  To  respond,  independent
t-tests  were  employed  with  the  order  of
captioning  (first/second  captioning)  as  the
independent  variable  and  listening
comprehension  and  vocabulary  scores  as
dependent variables involved in the analysis.
As  exhibited  in  Table  4,  there  was  not  a
statistically  significant  difference  between
the  listening  mean  scores  of  the  FCG  (M  =
45.32,  SD  =  3.08)  and  SCG  participants  (M
= 46.28, SD = 2.97) at 0.01, t (98) = -1.58, p
=  .117.  That  is,  the  FCG  participants'
performance on the listening comprehension
test was not significantly different from that
of  the  SCG  participants.  Naturally,  the
magnitude of the mean difference was small
(eta  squared  =  0.02).  In  line  with  these
results, as depicted in Table 5, no significant
difference  between  the  mean  scores  of  the
FCG  (M  =  26.86,  SD  =  3.28)  and  SCG
participants  (M  =  25.80,  SD  =  3.27)  was
reported,  t  (98)  =  1.61,  p  =  .110.  And,  the
eta squared was found to be so small (0.02).
That  is,  the  FCG  participants'  performance
on  the  L2  vocabulary  test  was  not
significantly  different  from  that  of  the  SCG

The  third  research  question  explored
whether L2 proficiency level interacted with
the  captioning  order  to  impact  the
participants’  L2  listening  comprehension
and vocabulary scores. To respond, separate
two-way  between-groups  ANOVAs  were
run. The results are shown in Tables 6 and 7.

The  ANOVA  revealed  that  there  was  a
statistically  significant  effect  for  the  L2
proficiency  level  on  the  listening
comprehension  scores,  F  (1,  192)  =  487,
*p<  .01.  Based  on  Cohen’s  (1988)
guidelines, the effect size for the proficiency
level  was  large  (partial  eta  squared  =  .71).
But,  the  interaction  effect  between
captioning  and  proficiency  level  for  the
listening  comprehension  scores  was  not
statistically  significant,  F  (3,  192)  =  .766,  p
=  .514.  Figure  4  displays  a  clear  picture  of
the  participants'  performance  in  the  two
proficiency  levels.  The  pattern  of
performance  was  somehow  similar  in  the
two  proficiency  level  groups,  with  the  CG
receiving  a  higher  listening  mean  score  and
FCG  receiving  a  lower  one.  Likewise,  the
post hoc test showed that the CG performed
significantly  better  than  other  groups
including  the  FCG  and  SCG.  Also,  the
performance of the SCG was better than that
of  FCG  on  the  listening  scores  even  though
the  difference  was  not  significant  at  .01
(mean difference = .96, p = .018).

Furthermore,  as  depicted  in  Table  7,  there
was  a  statistically  significant  effect  for  the
L2  proficiency  level  on  the  vocabulary
scores,  F  (1,  192)  =  429  *p<  .01,  with  the
effect size of .69. Following Cohen’s (1988)
guidelines, this effect size could be large. In
line  with  the  results  on  the  listening
comprehension  scores,  the  interaction  effect
between captioning and proficiency level for
the  vocabulary  scores  was  not  statistically
significant, F (3, 192) = 1.20, p = .297.  
Figure 5 displays how the participants in the
two  proficiency  levels  performed  on  the
vocabulary  test.  Again,  the  pattern  of
performance  was  almost  similar  in  the  two
proficiency  level  groups,  suggesting  no
interaction  between  the  proficiency  and
captioning  order;  the  CG  received  the
highest  vocabulary  mean  score  and  SCG

received  the  lower  one  in  the  two
proficiency  level  groups.  Also,  the  post  hoc
test  showed  that  the  CG  performed
significantly  better  than  other  groups
including  the  FCG  and  SCG.  However,
unlike  the  listening  test,  the  performance  of
the FCG was better than that of SCG on the
vocabulary scores though the difference was
not  significant  at  .01  (mean  difference  =
1.06, p = .013).

Captioned  video  and  audio  materials  for  L2
learning  are  becoming  more  common.
However,  there  is  controversy  over  whether
they  bring  more  native  voices  into  the
learning environment. The present study sets
out  to  investigate  L2  learners’  use  of
captions  while  listening  to  short  stories.  In
relation to the first research question, it was
found  that  captioned  audios  aided  L2
listening  comprehension  and  vocabulary
gains to a greater degree than non-captioned
audios  did.  In  other  words,  the  captioned
group  outperformed  the  non-captioned  in
both  the  listening  comprehension  and  the
vocabulary  tests.  The  benefit  of  captioning
could  be  due  to  a  bimodal  presentation
provided  in  the  caption  group.  Perhaps,  if
one  of  the  channels  (audio  or  visual)  failed,
the  other  one  compensated  for  the  failure.
This  is  plausible  given  that  listening
comprehension  and  vocabulary  learning  are
dependent  on  the  multiple  input  modalities.
The  other  possible  reason  is  that  listening
twice to the audios with captions might have
reduced  the  difficulty  of  input  to  reach  the
optimal level or, in Krashen's (2003) terms, i
+1.  Besides,  providing  the  two  channels
could  help  in  reducing  the  level  of  stress  or
anxiety  on  the  part  of  participants  or,  in
Krashen's  words,  lowering  the  affective
filter  so  that  the  participants  in  the  caption
group  could  take  in  more  comprehensible
input.  Vanderplank  (1993)  suggests  that
captions  are  not  affected  by  variations  in
accent  or  audio  quality.  If  so,  the  captions
could  reduce  stress  and  positively  facilitate
their  aural  comprehension  or  implicit
vocabulary  learning.  This  justification  is
also supported by the results of the study by
Bird  and  Williams  (2002),  who  found  that
vocabulary  presented  with  text  and  sound
(i.e.  captioning)  could  result  in  better
recognition  memory  for  spoken  words.
Markham  and  Peter  (2003)  also  found  that
captioning  could  improve  Spanish  ESL
learners'  listening  comprehension
effectively.  Zarei's  (2009)  study,  in  which
the bimodal subtitling was reported to be an
effective  mode  for  EFL  learners  for
comprehending  English  movies  and  picking
up new words, can also partially support the
above result.  
The  other  concern  of  this  study  was  to
investigate  the  ordering  effect  of  caption
presentation.  The  results  pointed  out  that
when  a  short  story  was  listened  to  twice,
once  with  captioning  and  once  without,  the
order  of  viewing  had  no  significant  impact
on  either  L2  listening  comprehension  or
vocabulary.  Winke  et  al.  (2010)  argued  that
the order of captioning had an impact on the
overall  comprehension  and  vocabulary
recognition.  They  found  that  Spanish  and
Russian  learners  presented  with  captions  in
the first viewing were better able to perform
on  the  listening  comprehension  and  the
vocabulary tests than learners presented with
captions  in  the  second  viewing.  They
suggested that this was due to the important
role  of  attention  in  L2  learning.  The  results
of  the  present  study  likewise  showed  the
participants had a better performance on the
vocabulary test with the captions in the first
viewing, but unlike the study by Winke et al.
(2010),  its  effect  was  not  found  to  be  so
significant  since  the  mean  score  of  the  first
captioning group was marginally better than
the  second  captioning  group.  Besides,
second  captioning  group  performed  better
on  the  listening  comprehension,  though  not
significantly.  It  can  be  argued  that  the
captions in the  first viewing seemed to help
isolate  key  vocabulary  that  the  L2  learners
were  not  encountered  for  the  first  time  or
perceived  to  be  important,  so  they  might
have paid more attention to new vocabulary
in  the  subsequent  listening  or  confirmed
their hypotheses on the meaning of unknown
words,  hence  having  a  better  performance
on  the  vocabulary  test.  That  is,  the  first
caption  viewing  might  help  further
information-gathering  on  the  vocabulary
during the second listening. If so, the second
listening  in  the  present  study  provided
additional  confirmatory/non-confirmatory
evidence  of  form-meaning  as  regards
vocabulary.  At  the  same  time,  the  first
caption  viewing  could  not  be  very
facilitative  for  L2  listening  comprehension
when  the  second  listening  was  presented
without  captioning  perhaps  because  most
participants  in  the  first  caption  group  might
have  lost  track  of  plots  or  the  main  idea  in
the audio stories. And, zero captioning in the
second listening could not greatly help them
compensate  for  their  failure.  Rather,  non-captioning  in  this  context  might  have  put
more stress on them, hence not displaying a
good  performance  on  the  listening
comprehension  test  in  comparison  with  the
vocabulary one. But when the first listening
was  presented  without  captioning,  the
participants' attention might have been better
drawn  to  the  incidents  and  theme  of  the
audio  stories  and  the  second  listening  with
captions  could  have  provided  additional
confirmatory/non-confirmatory  evidence  of
their comprehension or, at least, reduced the
anxiety  associated  with  listening.  All  said,
the  effect  of  ordering  needs  further  studies
before  making  a  strong  statement  about  its
effect  given  that  the  effect  of  this  variable
was not found to so effective in L2 listening
comprehension and vocabulary performance
at  a  less  conservative  level  of  significance,
and  there  are  not  enough  studies  in  the
literature  to  compare  and  generalize  the
above findings broadly.   
Finally, the participants' English proficiency
level  difference  did  not  provide  any  major
benefits  taken  from  captioning  order.  In
clear  terms,  listening  twice  to  a  short  story
with  captions  was  most  effective  for  both
high-  and  low-intermediate  L2  participants;
that  is,  captioning  helped  both.  Similarly,
listening  to  a  short  story  with  captions  the
first rather than the second time was equally
beneficial  for  the  vocabulary  performance,
and  listening  to  a  short  story  with  captions
secondarily  was  equally  useful  for  the
listening  comprehension  regardless  of  the
L2  proficiency  level.  Thus,  it  is  assumed
that  captioning  can  be  a  pedagogical  tool,
which  aids  language  processing,  and
function  similarly  for  upper  and  lower  L2
proficiency  levels.  Ellis  (2003)  states  that
“learning to understand  a language involves
parsing the speech stream into chunks which
reliably  mark  meaning”  (p.  77).  It  can  be
argued  that  the  captions  presented  twice
might  have  helped  the  L2  learners  see  and
be  able  to  parse  patterns  or  chunks  in  the
audio  listening  materials.  This  might  have
aided  both  high-  and  low-intermediate
participants  in  remembering  and  learning
from  the  chunks  when  they  were  repeated
(i.e.  presented  in  written  form  twice).
Meanwhile, the better mean scores obtained
by high-intermediate participants, in general,
as  compared  with  low-intermediate  ones,
could be due to better L2 ability, which was
observed  in  all  four  groups  of  the  study.
That  is  to  say,  it  was  regardless  of  caption
ordering.  The  investigation  of  patterns  of
performance  by  the  two  proficiency  level
groups  in  Figures  4  and  5  suggest  that
captioning  in  repeated  listening  can  be
beneficial  for  a  range  of  proficiency  levels,
perhaps,  so  long  as  the  listening  materials
are  suitable  in  terms  of  content  and
difficulty  to  L2  learners'  proficiency  levels.
However,  in  Taylor’s  (2005)  study,  the
lower-level learners reportedly had difficulty
with  attending  to  captions  than  upper-level
students  perhaps  because  they  had  a  harder
time with the content of the video materials;
the content might have been too difficult for
them.  In  line  with  Winke  et  al.'s  (2010)
claim, the above results of the present study
suggest  that  the  question  over  whether
lower-level  students  can  benefit  from
captions  in  the  same  way  as  upper-level
learners  should  focus  more  on  the
appropriateness  of  the  complexity  level  of
L2  listening  materials  rather  than  the
appropriateness  of  the  captioning  for  L2
lower-level learners.  
According  to  Hashemi  and  Aziznezhad
(2011),  “CALL  offers  modern  English
language  teachers  many  facilities  and  novel
techniques  for  teaching  and  learning”  (p.
833). Thus, the effect of CALL on listening
comprehension  and  vocabulary  learning  has
shown  great  consideration  among  language
teachers  and  researchers.  Despite  the
significance  of  CALL  in  listening
comprehension  or  vocabulary  learning  and
captioning,  supported  by  a  number  of
empirical  studies  conducted  in  L1  (Bird  &
Williams,  2002),  there  is  a  paucity  of
research  on    CALL-facilitated  captioning
techniques  in  L2  listening  comprehension
and vocabulary learning, particularly in EFL
contexts.  The  present  study  then  took  a
further  step  to  help  fill  this  gap  by
investigating,  firstly,  the  impact  of
captioning;  secondly,  the  effect  of
captioning  order;  and,  finally,  the  effect  of
possible  interaction  between  L2  (i.e.
English)  proficiency  and  captioning  on  L2
listening  comprehension  and  vocabulary
The  results  indicated  that  captions  had  a
beneficial  effect  on  both  L2  listening
comprehension  and  vocabulary  gains.  They
can  result  in  greater  depth  of  language
processing  by  presenting  multiple  input
modalities  and  reducing  anxiety,  and  assist
the  implicit  learning  of  vocabulary  through
the  unpacking  of  language  chunks  or
mapping  form-meaning.  Also,  the  results
revealed that the captioning order played no
significant  role  in  the  L2  listening
comprehension  and  vocabulary
performance. In other words, listening twice
to a short story,  first with captions and then
without,  did  not  significantly  affect  the  L2
learners’  performance  on  the  listening

comprehension  and  vocabulary  tests.
However,  this  issue  is  due  further
investigation  since  small  contributions
sometimes  cannot  totally  be  ignored  in
educational  settings.  Finally,  this  study  did
not find that L2 proficiency level differences
would  affect  performance  derived  from
captions  ordering.  Constrained  by  the  time,
this study did not explore whether additional
listening  with  captions  or  captioning  order
would  result  in  greater  vocabulary  and
comprehension  gains.  Possibly  there  is  a
ceiling effect for captioning. Besides, the L2
participants in this study were not allowed to
toggle  captions  on  and  off  in  the  program.
Perhaps  allowing  L2  learners  to  toggle
captions  on  and  off  can  provide  more
information  when  captions  might  be  useful
or useless to them. Thus, future research can
transcend  limitations  observed  in  this  study
in  addressing  captioning  in  a  multimedia



Barani, G. (2011). The relationship between
computer  assisted  language  learning
(CALL) and listening skill of Iranian
EFL  learners.  Procedia  Social  and
Behavioral Sciences, 15, 4059-4063.
Bird,  S.  A.,  &  Williams,  J.  N.  (2002).  The
effect  of  bimodal  input  on  implicit
and  explicit  memory:  An
investigation  into  the  benefits  of
within-language  subtitling.  Applied
Psycholinguistics, 23(4), 509-533.
Brett,  P.  (1995).  Multimedia  for  listening
comprehension:  The  design  of  a
multimedia-based  resource  for
developing  listening  skills.  System,
23(1), 77-85.
Catell,  R.  B.  (1966).  The  scree  test  for
number  of  factors.Multivariate
Behavioral Research, 1, 245-276.
Chang,  C.  Y.  (2002).  Does  computer-assisted  instruction+problem
solving=improved science outcome?:
A prior study. Educational Research,
95(3), 143-149.
Cohen,  J.  W.  (1988).  Statistical  power
analysis  for  the  behavioral  sciences
ed.).  Hillsdale,  NJ:  Lawrence
Erlbaum Associate.
Cushion,  S.,  &  Dominique,  H.  (2002).
Applying  new  technological
developments  to  CALL  for  Arabic.
Computer  Assisted  Language
Learning, 15(5), 501-508.
Danan,  M.  (2004).  Captioning  and
subtitling:  Undervalued  language
learning  strategies.    Translators’
Journal, 49(1), 67-77.
deHaan,  J.  (2011).  Teaching  and  learning
English  through  digital  game
projects.  Digital  Culture  &
Education, 3(1), 46-55.
Davies,  G.  (2002).  Computer  Assisted
Language  Learning  (CALL).
Retrieved  January,  2012  from
Dupagna,  M.,  Stacks,  D.  W.,  &  Giroux,  V.
M.  (2007).  Effects  of  video
streaming  technology  on  public
speaking  students’  communication
apprehension  and
competence.Journal  Educational
Technology Systems, 35(4), 479-489.
Ellis, N. C. (2003). Constructions, chunking,
and  connectionism:  The  emergence
of second language structure. In C. J.
Doughty  &  M.  H.  Long  (Eds.),  The
handbook  of  second  language
acquisition  (pp.  63-103).  Malden,
MA: Blackwell.  
Grgurović,  M.,  &Hegelheimer,  V.  (2007).
Help  options  and  multimedia
listening:  Students'  use  of  subtitles
and  the  transcript.  Language
Learning and Technology, 11(1), 45-66.
Guillory,  H.  G.  (1998).  The  effects  of
keyword  captions  to  authentic

French  video  on  learner
comprehension.Calico  Journal,
15(1), 89-108.
Gunduz,  N.  (2005).  Computer-assisted
language  learning  (CALL).Journal
of  Language  and  Linguistic  Studies,
1(2), 193-214.
Hansen,  C.,  &  Lesley,  T.  (2005).Placement
and evaluation package. Cambridge:
Cambridge University Press.
Hashemi,  M.,  &Aziznezhad,  M.  (2011).
Computer assisted language learning
freedom or submission to machines?
Procedia-Social  and  Behavioral
Sciences, 28, 832-835.
Hayati,  S.  S.,  &  Vahid,  H.  (2012).The
relationship  between  prior
knowledge  and  EFL  learners’
listening  comprehension:  Cultural
knowledge  focus.  Mediterranean
Journal  of  Social  Sciences,  3(1),
Hill,  L.  A.  (1988).  Steps  to  understanding.
Oxford: Oxford University Press.
Jayachandran,  J.  (2007).  Computer  assisted
language learning (Call) as a method
to develop study skills in students of
engineering  and  technology  at  the
tertiary  level.  The  Indian  Review  of
World Literature in English, 3(2), 1-7.
Khoii, R., &Aghabeig,  M. (2009).Computer
software and the improvement of the
elementary  EFL  students’  listening
comprehension.Journal  of  Teaching
English  as  a  Foreign  Language  and
Literature, 1(2), 89-101.
Krashen,  S.  D.  (2003).  Explorations  in
language  acquisition  and
use.Portsmouth, NH: Heinemann.
Markham,  P.  L.,  &  Peter,  L.  (2003).  The
influence  of  English  language  and
Spanish language captions on foreign
language  listening/reading
comprehension.  Journal  of
Educational  Technology  Systems,
31(3), 331-341.  
Pujola,  J.  T.  (2002).  CALLing  for  help:
Researching  language  learning
strategies  using  help  facilities  in  a
web-based  multimedia  program.
ReCALL, 14(2), 235–262.
Taylor,  G.  (2005).  Perceived  processing
strategies  of  students  watching
captioned  video.  Foreign  Language
Annals, 38(3), 422-427.
Tuncok,  B.  (2010).  A case study: Students’
attitudes  towards  computer  assisted
learning,  computer  assisted
language  learning  and  foreign
language  learning.  Unpublished
master's  thesis,  The  University  of
Arizona, US.
Vanderplank,  R.  (1993).  A  very  verbal
medium:  Language  learning  through
closed  captions.  TESOL  Journal,
3(1), 10-14.
Verdugo,  D.  R.,  &  Belmonte,  I.  A.  (2007).
Using  digital  stories  to  improve
listening  comprehension  with
Spanish  young  learners  of
English.Language  Learning  &
Technology, 11(1) 87-101.
Volle, L. M. (2005). Analysing oral skills in
a  voice  email  and  online
interviews.Language  Learning
&Technology, 9(3), 146-163.
Winke,  P.,  Gass,  S.,  &  Sydorenko,  T.
(2010).  The  effects  of  captioning
videos  used  for  foreign  language
listening  activities.Language
Learning  &  Technology,  14(1),  65-86.
Yang-dong,  W.,  &Cai-fen,  S.  (2007).
Tentative  model  of  integrating
authentic  captioned  video  to
facilitate  ESL  learning.  PLA
University  of  Foreign  Languages,
4(9).  Retrieved  August  10,  2012

Zarei,  A.  A.  (2009).  The  effect  of  bimodal,
standard  and  reversed  subtitling  on
L2  vocabulary  recognition  and
recall.Pazhuhesh-e-Zabanha-ye-Khareji, 49, 65-85.