English Language Center, Isfahan University of Technology, Isfahan, Iran
Abstract: Lexical bundles, as building blocks of coherent discourse, have been the subject of
much research in the last two decades. While many of such studies have been mainly concerned
with exploring variations in the use of these word sequences across different registers and
disciplines, very few have addressed the use of some particular groups of lexical bundles within
some genres of academy. To address generic variations, this research focused on anticipatory it
bundles as a particular structural group of bundles. More specifically, this study chose to
investigate range, frequency, and function of these word clusters in applied linguistics research
articles and postgraduate writing. Through the use of two big corpora of research articles and
postgraduate theses, two text analysis programs, and a functional taxonomy of it bundles, this
study found that it bundles were used relatively frequently in both published and postgraduate
writing. Functional analysis showed that anticipatory it lexical bundles served a wide variety of
functions in both genres investigated. This study also revealed that some anticipatory it lexical
bundles commonly used by students in their postgraduate writing did not count as bundles in
research articles, both in terms of variety and frequency. As for implications, the study calls for
the incorporation of such clusters in L2 and/or EAP (English for Academic Purposes) courses.
Lexical bundles, also known as clusters and chunks (Hyland, 2008a, 2008b), were first
introduced and defined by Biber, Johansson, Leech, Conrad, and Finegan (1999). They
referred to lexical bundles as recurrent expressions, regardless of their idiomaticity, and
regardless of their structural status. More importantly, they considered frequency as the
defining characteristic of bundles; in order for a word combination (e.g. on the other hand, at
the same time, it is necessary to, etc.) to count as a bundle, it must occur at least twenty times
in a corpus made of one million words with the additional requirement that this rate of
occurrence be realized in at least five different texts to guard against idiosyncratic uses.
Lexical bundles are identified on the basis of frequency and breadth of use (Cortes,
2002, 2004). Fixedness in form (e.g., on the basis of not on a basis of) and non-idiomatic
meaning are other properties of bundles. Among other registers, lexical bundles have been
found to be an important part of academic discourse too (Biber et al, 1999).Such word
sequences have been classified structurally (Biber et al, 1999; Biber, Conrad & Cortes, 2004;
Biber, 2006; Jalali, Eslami Rasekh & Tavangar Rizi, 2008, 2009) as well as functionally
(Cortes, 2002, 2006; Biber & Barbieri, 2007; Hyland, 2008a, 2008b; Jalali, 2009, 2013; Jalali
& Ghayoumi, 2010). Lexical bundles can serve a wide range of discursive functions such as
organization of discourse, expression of stance, and reference to textual or external entities
(Biber & Barbieri, 2007; Jalali, 2013). Some studies conducted in this regard are briefly
Since 1999, a number of studies have been specifically launched to explore possible
differences and\or similarities in the use of bundles between a few disciplinary fields (Cortes,
2002, 2004; Hyland, 2008a, 2008b), registers, such as conversation, fiction, news, academic
prose, classroom teaching and non-conversational speech (Biber et al, 1999; Biber et al,
2004, Biber & Barbieri, 2007), genres (Hyland, 2008b; Jalali, 2013), and different degrees of
writing expertise (Cortes, 2002, 2004; Jalali, 2009; Jalali et al., 2008, 2009).
Overall, these studies have indicated that lexical bundles are strong discipline, genre,
and register discriminators. This means that apart from some overlaps, each discipline, genre,
or register draws on its own set of bundles to organize its discourse, express stance, and refer
to different parts of the evolving text or elements outside the text. The findings have also
stressed that many lexical bundles favored by experts in a given disciplinary area may not be
used by novices who could be students or developing writers with varying degrees of
language proficiency and disciplinary expertise.
Interestingly, there is also usually a correlation between the structural type of bundles
and the function they serve in the discourse (Biber et al, 2004); for example, anticipatory it
bundles (e.g. it should be noted, it can be seen), the subject of the present study, are usually
used to act as metadiscourse elements (Hyland, 2000, 2008a, 2008b) or expressions of stance
(Biber, 2006). Biber et al. (1999) have shown that it clauses followed by either to (as in it is
important to note that this relationship may always be true) or that (as in it is clear that this
policy is unlikely to lead to fruitful results) are common in academic writing and their
relatively frequent presence has been substantiated in a range of academic genres (Hewings
& Hewings, 2002).
The study of this structural group of lexical bundles can be important for two reasons.
First, there is some evidence to suggest that for many non-natives, this structure can pose
serious degrees of difficulty, mostly because of the absence of an anticipatory it structure in
some languages (Jacobs, 1995, Hewings & Hewings, 2002). Second, due to the importance of
this structure as a metadiscursive element or a stance expression, it can be important to
identify the range of interpersonal meanings conveyed by such word clusters as it bundles are
usually good means by which writers can express their opinions, evaluate the subject matter,
and engage with readers (Hewings & Hewings, 2002).
According to Hewings and Hewings (2002), lexical bundles starting with an
anticipatory it have four metadiscoursal or interpersonal roles: hedges (showing speaker or
writer's tentativeness and uncertainty about the following proposition), attitude markers
(expressing writer's attitude toward the content), emphatics (stressing writer's certainty about
the force, and the credibility of the propositional meaning), and attribution (convincing the
reader through a general or specific reference). The review of the literature showed that very
few studies have focused on the use of anticipatory it bundles within some key genres of
academy (see Hewings & Hewings, 2002; Hyland, 2008a). Especially important is that there
is the scarcity of studies addressing specific phraseological practices in different disciplinary
areas, especially with an aim to describe and explain possible differences and/or similarities
between experts and novices in their use of these word combinations in their respective high-stakes genres.
The purpose of this study was to compare the use of one structural class of bundles in
some key written academic genres of one disciplinary area of applied linguistics through the
use of two corpora of academic writing. The assumption was that exploring possible
variations in the use of such word combinations across genres could be a good contribution to
a better understanding of phraseological preferences and practices in different discourse
More specifically, the study probed the use of anticipatory it lexical bundles in two
genres of applied linguistics. Applied linguistics was selected as the discipline of interest for
two reasons: (1) it has not been subject to rigorous analysis in terms of such clusters and (2),
raising awareness of genre features through such studies can become part of its disciplinary
content. Accordingly, two corpora of research articles and postgraduate writing in applied
linguistics were employed to find the extent to which these two academic genres in a single
disciplinary area are similar to or different from each other. At the same time, by comparing
the two genres of applied linguistics, this study attempted to show the extent to which
students' use of anticipatory it bundles could be compared to that of published writers.
This study, therefore, addressed the following questions:
1. What are the most frequent four-word anticipatory it lexical bundles in applied linguistics
published and postgraduate writing?
2. To what extent is there evidence to support similarity or contrast in the range, frequency,
and function of anticipatory it lexical bundles across the two genres?
Two corpora were used in this study. The first corpus included published writing in the
discipline of applied linguistics, and the second one represented students' unpublished writing
at post graduate level. The second corpus consisted of master theses and doctoral
dissertations written by some EFL students within the discipline of applied linguistics, with
relevance to English language teaching and translation. Each of these corpora will be
described more below. The first corpus had been originally prepared by Jalali (2009) for his
study on variations in the use of lexical bundles within a single discipline: applied linguistics.
The basis for the selection of journal articles was mostly previous corpus-based studies
done on the scientific discipline of applied linguistics (e.g. Ruiying & Allison, 2003), the
advice given by experts in the field, and access to the electronic files of papers. Table 1
represents the journals, the numbers of texts, and the number of words in this corpus. The
second corpus, also collected by Jalali (2009), included master theses and doctoral
dissertations written by some postgraduate EFL students during 2004-2009 time period.
Data analysis tools
Two computer programs were used in this study: Antconc3.2.1w (Anthony, 2007), and
Wordsmith (Scott, 2008). The former was used for the identification of lexical bundles and
concordancing while the latter was only employed to find the number of texts within which
each bundle had been used. These two are described more below.
Antconc3.2.1.w is a free concordance program designed and developed by Anthony
(2007) (see Fig.1). This study used it to identify anticipatory "it" lexical bundles and find
their frequency. It has useful tools such as concordance, concordance plot, file view, N-grams, collocates, word list, and keyword list that are used to analyze texts of different kinds
and lengths. The concordancer also makes it possible to see each of the clusters in actual
textual context within which it has originally been used.
Among all these tools, there is one by which it is possible to identify word
combinations, clusters, or lexical bundles of different lengths and frequencies in small or
large corpora. All lexical bundles in corpora of different sizes with their actual frequencies
can be found and displayed by inserting a set of commonly key words with which the bundles
collocate, such as prepositions (e.g., at, of, on, etc), modals (e.g., can, should, could, may,
etc), etc, and deciding on the minimum optimal frequency (e.g. twenty in a corpus of one
million words) and the required number of words in clusters (i.e. three, four, five, or six).
However, As Antconc3.2.1.w could not count and display the number of different texts,
WordSmith tools5 (Scott, 2008) was applied for the identification of lexical bundles in
different texts. This program is similar in many ways to Antconc3.2.1.w, but it does show the
number of texts in which bundles have been used. So when all candidate lexical bundles were
identified by the first computer program, each of them was again searched on Wordsmith
tools5 to find the number of texts with which they have been used. Only those four- word
combinations could count as lexical bundles that had been used ten times and in at least five
different texts no matter how frequent they were (Biber et al, 1999). This was to guard
against idiosyncratic and repetitive uses of the same bundle in the same text by the same
Functional analysis of bundles
The focus of this study was on 4-word it bundles because previous research has shown that
they are far more common than 5-word strings and offer a wider range of structures and
functions than 3-word bundles (Cortes, 2004). Bundles are essentially extended collocations
defined by their frequency of occurrence and breadth of use, but the actual frequency cut offs
are somewhat arbitrary. This study took a conservative approach by setting a minimum
frequency of 10 times per million words and occurrence in at least 10% of texts, i.e. the word
combinations has to appear in five or more texts to be regarded as a lexical bundle.
The data were analyzed in three steps. First, all anticipatory it lexical bundles were
identified in the two corpora along with their actual frequencies and the number of texts in
which they had been used. Second, by using a functional typology of it-clauses developed by
Hewings and Hewings (2002) (see table 3) and the AntConc 3.2.1 concordancer (Anthony,
2007) and Wordsmith tool5 (Scott, 2008) for conducting the quantitative analysis of lexical
bundles, an attempt was made to probe the context in which bundles had been used to decide
on the most predominant functions. This was done by both authors until reaching an
agreement of 100% on all cases. In the third stage, the results were compared to determine
the extent to which research articles of applied linguistics were different and/or similar to
postgraduate writing in terms of range, frequency, and function of anticipatory it bundles. It
must be noted that while there are already some functional classifications of lexical bundles
(e.g. Cortes, 2002; Biber et al, 2004; Hyland, 2008a, 2008b), Hewings and Hewings'
functional taxonomy of it-clauses (2002) was used in this study since it specifically dealt with
the interpersonal functions of this structural group.
Lexical bundles in applied linguistics published writing
Table 4 shows anticipatory "it" lexical bundles in the corpus of published writing in applied
linguistics along with the frequency and the number of texts in which they had been used. A
total of seventeen different it-bundles were drawn from this corpus. The overall use of these
bundles was 449, mounting to 0/036% of the whole corpus. In terms of function, this corpus
capitalized maximally on attitude markers (43.20%) and minimally on the attribution markers
(3.80%) (see table 5). Some of the most frequent it-bundles were: it is important to (88
times), it should be noted (40 times), it is possible that (38 times), and it is difficult to (36
times). A large number of anticipatory "it" lexical bundles in this corpus had also the pattern
of it +V+ adjective + that/to. It also seemed that the use of such bundles by published writers
in applied linguistics helped writers to encode different interpersonal meanings. The following
examples from this corpus can show the use of some of such bundles by published writers:
(1) As a result of these experiences, it is possible that these students retrospectively
constructed the mainstream basic writing section as being “for American students” and
assumed that such an environment would have been more stressful for them than the
(2) It may be that students in the sciences, all PhD students in our case, focused more on the
explicit goals of the courses, which answer an urgent need to publish; others seemed rather
more open to acknowledging more personal gains.
(3) It is important to emphasize in this section that although the majority of the words that
remind us of a non-Spanish spelling are grouped among those which form their plural by
adding the suffix -s, we have found two examples of zero plural morpheme: Bluetooth and
(4) By way of final comment, it is interesting to note that the results of the study are
compatible with a view of language learning that distinguishes the acquisitional processes
involved in the development of implicit L2 knowledge from the general deductive learning
strategies involved in the development of explicit knowledge.
Lexical bundles in applied linguistics postgraduate writing
As shown in table 6, there were again seventeen different anticipatory "it" lexical bundles in
the corpus of postgraduate writing: it was found that, it is important to, and it should be noted
were some of the more frequent lexical bundles used by postgraduate students. The overall
frequency of all it-bundles in this corpus was 354, covering 0.038% of the whole corpus.
However, the overall frequency of "it" lexical bundles in this corpus was lower than that of
applied linguistics research articles (0/036%). Interestingly, however, there were some
bundles in this corpus (i.e. it should be mentioned, it was revealed that, and it is assumed
that) that were only used by postgraduate students, not published writers, in applied
Functional analysis also showed that postgraduate students, like published writers, were
able to employ lexical bundles in the discourse to serve a wide variety of different functions
(see Table 7). As can be seen, among the five categories, 34.45% of all it-bundles were
devoted to those sequences expressing epistemic meanings. Emphatics were the second group
of bundles in terms of the occurrence, covering around 26% of all it-bundles, with attitude
markers (19.77%) and hedges (16.37) were the next. And finally, the category of attribution
was found to be the least used, with a portion of 3.38%. The following examples can show
some of these different uses:
5) In general, it seems that the newspapers through the language used and, more specifically,
through the sequence of discursive features that include transitivity, thematization,
lexicalization, and modality encode and reinforce asymmetries between EU and Iran in their
representation, in the context of west-dominated international politics.
6) Also, according to the suggestions, it is possible to speculate the meaning of unknown
words when 95 percent of the words in the text are familiar to the reader.
7) Thus, when authors use expressions such as my purpose for you in this chapter is to, it is
important to note that, perhaps, and surprisingly, they are using metadiscourse.
8) Thus, although cognates seem to be better remembered than non-cognates, it is not clear
that this is due to their sharing a memory representation, as there is a great deal of debate
over how bilingual memory is organized.
Comparisons in terms of variety and frequency of bundles
Probably, the most surprising finding of this study was related to the similarity between the
two corpora under investigation in terms of the range of it-bundles employed. Although the
number of texts used in the corpus of applied linguistics articles was six times more than that
of postgraduate writing, these two corpora were very similar in terms of variety. Out of
seventeen (17) bundles used in applied linguistics research articles, fifty-three percent (53%)
were used in the other corpus too. Table 8 shows shared it-bundles in the two corpora. The
results obtained also showed that the frequency of it-bundles was almost the same in the
corpus of applied linguistics published writing and the corpus of postgraduate writing (368,
and 386, respectively), as shown in table 9.
Comparisons in terms of functions of bundles
In terms of generic differences in the variety of bundles used in each major functional
category, it was found that the variety of it-bundles serving as hedges and attitude markers in
applied linguistics published writing was more than that of postgraduate writing. While in the
case of emphatics and attributions, there was a slight difference, for epistemic meanings, it
was the postgraduate writing that made a considerably heavier use.
There were attitude markers (i.e. it is interesting to, it is important that, and it is hoped
that) that were only used by published writers in applied linguistics. Especially important was
the higher frequency of it is important to in the corpus of research articles. It is difficult to
was another bundle which was also used more heavily by applied linguistics writers.
Interestingly and in contrast to some findings of the previous research (e.g. Hyland, 2008a,
2008b, Cortes, 2004), postgraduate students, who might not have established themselves as
members of their disciplinary communities, were found to be confident in using those
stretches that involved making emphasis. This showed that postgraduate students could
express their attitudinal meaning in a straightforward manner.
Discussion and conclusion
Postgraduate students' relatively frequent use of anticipatory it bundles in their writing could
be taken as a surprising result in this study as the previous research (e.g., Cortes, 2004) had
shown that students tended to rely less on bundles in the development of their discourses. The
analysis of the corpus of postgraduate writing used showed that the number of different
lexical bundles used by students in their writing was almost as many as those used by
published writers. It seemed that students, both at the master's and doctoral levels, could
handle the use of anticipatory it lexical bundles for a wide variety of discursive functions.
However, while this relatively frequent use of it bundles with metadiscursive functions could
be indicative of writing expertise and disciplinary growth, it can also be argued that the heavy
use of such a wide variety of bundles may not always be a sign of proficient language use and
disciplinary expertise. Less proficient language users may need to rely more on formulaic
expressions like lexical bundles. Research article writers, apart from lexical bundles, may
rely on other resources like specialized vocabulary, diverse word choices, conjunctions,
discourse markers, and manipulation of syntactic devices to develop their arguments.
Students' relatively frequent use of anticipatory it bundles could also be due to the fact
that they have already been exposed to such word-sequences several times in their prior
readings of applied linguistics published literature. There is almost no doubt that postgraduate
students have repeatedly observed different lexical bundles in different research articles to
which they have been exposed for doing and writing their own research. Furthermore, given
that anticipatory it lexical bundles are very pervasive in university written language (Biber at al,
1999; Biber & Barbieri, 2007) and they may have a formulaic status (Wray, 2000), the use of
such word combinations may not confront students with a very difficult task.
Also, it has been postulated that lexical bundles are retrieved and stored whole from
memory through holistic rather than analytical processes (Conklin & Schmitt, 2008) and
therefore, postgraduate students may have little if any difficulty not only in understanding but
also in producing lexical bundles. There may be a processing advantage in the use of lexical
bundles as some formulaic sequences have been shown to be easier to use (Conklin &
Schmitt, 2008). It can also be postulated that lexical bundles can act as handy short-cuts or
frames (Biber & Barbieri, 2007) through which writers can scaffold their propositional
meanings with a relative ease. However, automatic acquisition of lexical bundles should not
be taken for granted as this study also showed that there were some lexical bundles in applied
linguistics published writing on which students did not draw quite frequently or were not
used at all. These word sequences are not idiomatic in meaning and hence they may be easy
to understand, but they may not seem to be marked and perceptually salient. Consequently,
there may still be a need to incorporate them in L2 syllabus or EAP (English for academic
purposes) courses for an increased pedagogical focus on lexical bundles. This is especially
important for those students who need to understand and use such lexical bundles in their
future target genres (Hyland, 2008b).
Biber, D. (2006). University language: A corpus-based study of spoken and written registers.
Biber, D, & Barbieri, F. (2007). Lexical bundles in university spoken and written registers.
English for Specific Purposes, 26, 263-286.
Biber, D, Conrad, S & Cortes, V. (2004). If you look at …: lexical bundles in university
teaching and textbooks. Applied Linguistics, 25, 371–405.
Biber, D, Johansson, S, Leech, G, Conrad S, & Finegan, E. (1999). Longman grammar of
spoken and written English. Harlow: Pearson.
Conklin, K, & Schmitt, N. (2008). Formulaic Sequences: Are They Processed More Quickly
than Nonformulaic Language by Native and Nonnative Speakers? Applied Linguistics,
Cortes, V. (2002). Lexical bundles in academic writing in history and biology. Unpublished
Doctoral dissertation: Northern Arizona University, Arizona.
Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples
from history and biology. English for Specific Purposes, 23 (4), 397–423.
Hewings, M. & Hewings, A. (2002). "It is interesting to note that…..": A comparative study
of anticipatory 'it' in student and published writing. English for Specific Purposes, 21,
Hyland, K. (2000). Disciplinary discourses: Social interaction in academic writing. London:
Hyland, K. (2008a). As can be seen: Lexical bundles and disciplinary variation. English for
Specific Purposes, 27(1), 4-21.
Hyland, K. (2008b). Academic clusters: text patterning in published and postgraduate
writing. International Journal of Applied Linguistics, 18 (1), 41-62.
Jacobs, R.A. (1995). English syntax: A grammar for English language professionals. New
York: Oxford University Press.
Jalali, H. (2009). Lexical bundles in applied linguistics: Variations within a single discipline.
Unpublished doctoral dissertation: Isfahan University, Isfahan.
Jalali, H. (2013). Lexical bundles in applied linguistics: Variations across postgraduate
genres. Journal of Foreign Language Teaching and Translation Studies, 2 (2), 1-29.
Jalali, H., Eslami Rasekh, A. & Tavangar Rizi, M (2008). Lexical bundles and
intradisciplinary variation: The case of applied linguistics. Iranian Journal of Language
Studies, 2(4), 447-484.
Jalali, H., Eslami Rasekh, A. & Tavangar Rizi, M. (2009). Anticipatory 'it' lexical bundles: A
comparative study of student and published writing in applied linguistics. Iranian
Journal of Language Studies, 3 (2), 177-194.
Jalali, H., & Ghayoomi, S. (2010). A comparative qualitative study of lexical bundles in three
academic genres of applied linguistics. Modern Journal of Applied Linguistics, 2 (4),
Ruiying, Y, & Allison, D. (2003). Research articles in applied linguistics: moving from
results to conclusions. English for Specific Purposes, 22 (4), 365-385.
Scott, M. (2008). Wordsmith Tools 5. Oxford: Oxford University Press.
Thompson, G. (2001). Interaction in academic writing: Learning to argue with the reader.
Applied Linguistics, 22(1), 58-78.
Wray, A. (2000). Formulaic sequences in second language teaching: Principle and practice.
Applied Linguistics, 21(4), 463-489.