Article published In:
International Journal of Learner Corpus Research: Online-First ArticlesThe more proficient the learners, the less sophisticated their L2 vocabulary?
The curious effect of the reference corpus on mean-frequency measures of lexical sophistication
Mean-frequency scores of lexical sophistication are used to evaluate written and spoken language production. They
are calculated using word frequencies extracted from a reference corpus. Using mixed-effects regression models, we analyse the
strength of the relationship between L2 proficiency and mean-frequency scores in spoken and written texts using reference corpora
representing different modes and registers. We control for task and topic effects. We observe that mean-frequency measures of
lexical sophistication are considerably more influenced by the mode and register of the reference corpus used to calculate these
scores than by language users’ proficiency level. Advanced language users produce more frequent vocabulary, typical of the target
register, in both spoken monologues and written essays. These results provide evidence in favour of a conceptual and
terminological shift from lexical sophistication to register appropriateness (as suggested by
Durrant & Brenchley, 2019) to refer to the construct captured by mean-frequency
scores of vocabulary use.
Keywords: lexical complexity, L2 proficiency, reference corpora, register appropriateness, language testing
Article outline
- 1.Introduction
- 2.The problem
- 3.Hypotheses
- 4.Methodology
- 4.1Data
- 4.2Procedure
- 5.Results
- 5.1Results from the ICNALE spoken monologues
- 5.2Results from the ICNALE Written Essays
- 6.Discussion
- 6.1Hypothesis 1: The relationship between L2 proficiency and mean-frequency scores of content words (CW)
- 6.2Hypothesis 2: The effect of mode and register of the reference corpus on mean-frequency scores of content words (CW)
- 6.2.1The ICNALE spoken monologues
- 6.2.2The ICNALE Written Essays
- 6.3Limitations and future research
- 7.Conclusions
- Open data badge and open code badge
- Notes
-
References
Available under the Creative Commons Attribution (CC BY) 4.0 license.
For any use beyond this license, please contact the publisher at [email protected].
Published online: 21 October 2024
https://doi.org/10.1075/ijlcr.23029.bot
https://doi.org/10.1075/ijlcr.23029.bot
References (62)
Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The
English lexicon project. Behavior Research
Methods,
39
(3), 445–459.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting
linear mixed-effects models using lme4. Journal of Statistical
Software,
67
(1), 1–48.
Bell, H. M. (2003). Using
frequency lists to assess L2 texts [PhD thesis, University of Wales]. [URL]
Berger, C., Crossley, S., & Skalicky, S. (2019). Using
lexical features to investigate second language lexical decision performance. Studies in Second
Language
Acquisition,
41
(5), 911–935.
Biber, D. (2023). Writing
and speaking. In R. Horowitz (Ed.), The
Routledge International Handbook of Research on Writing (2nd
ed., pp. 535–548). Routledge.
Biber, D., Johansson, S., Leech, G. N., Conrad, S., & Finegan, E. (2021). Grammar
of spoken and written English. John Benjamins.
BNC Consortium. (2007). The British
National Corpus (XML) [Data set]. Oxford Text Archive. [URL]
Bottini, R. (2022). Lexical
complexity in L2 English speech: Evidence from the Trinity Lancaster Corpus [PhD
thesis]. Lancaster University.
Brezina, V., Hawtin, A., & McEnery, T. (2021). The
written British National Corpus 2014 — design and comparability. Text &
Talk,
41
(5–6), 595–615.
Brysbaert, M., & New, B. (2009). Moving
beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved
word frequency measure for American English. Behavior Research
Methods,
41
(4), 977–990.
Council of
Europe (Ed.). (2020). Common European framework of reference
for languages: Learning, teaching, assessment. Companion volume. Council of Europe Publishing. [URL]
Crossley, S., Cobb, T., & McNamara, D. (2013). Comparing
count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical
applications. System,
41
(4), 965–981.
Crossley, S., Kyle, K., & Römer, U. (2019). Examining
lexical and cohesion differences in discipline-specific writing using multi-dimensional
analysis. In T. B. Sardinha & M. V. Pinto (Eds.), Multi-dimensional
analysis: Research methods and current
issues (pp. 189–216). Bloomsbury Academic.
Crossley, S., & McNamara, D. (2012). Predicting
second language writing proficiency: the roles of cohesion and linguistic
sophistication. Journal of Research in
Reading,
35
(2), 115–135.
(2013). Applications
of text analysis tools for spoken response grading. Language Learning &
Technology,
17
(2), 171–192.
Crossley, S., Salsbury, T., & McNamara, D. (2010). The
development of polysemy and frequency use in English second language speakers: Polysemy and frequency use in English L2
speakers. Language
Learning,
60
(3), 573–605.
(2012). Predicting
the proficiency level of language learners using lexical indices. Language
Testing,
29
(2), 243–263.
Crossley, S., Salsbury, T., McNamara, D., & Jarvis, S. (2011). What
is lexical proficiency? Some answers from computational models of speech data. TESOL
Quarterly,
45
(1), 182–193.
Dawson, N., Hsiao, Y., Tan, A., Banerji, N., & Nation, K. (2021). Features
of lexical richness in children’s books: Comparisons with child-directed speech.
Dombi, J., Sydorenko, T., & Timpe-Laughlin, V. (2022). Common
ground, cooperation, and recipient design in human-computer interactions. Journal of
Pragmatics,
193
1, 4–20.
Durrant, P. (2014). Corpus
frequency and second language learners’ knowledge of collocations: A
meta-analysis. International Journal of Corpus
Linguistics,
19
(4), 443–477.
Durrant, P., & Brenchley, M. (2019). Development
of vocabulary sophistication across genres in English children’s writing. Reading and
Writing,
32
(8), 1927–1953.
Durrant, P., & Durrant, A. (2022). Appropriateness
as an aspect of lexical richness: What do quantitative measures tell us about children’s
writing? Assessing
Writing,
51
1, 100596.
Durrant, P., Moxley, J., & McCallum, L. (2019). Vocabulary
sophistication in First-Year Composition assignments. International Journal of Corpus
Linguistics,
24
(1), 33–66.
Egbert, J. (2017). Corpus
linguistics and language testing: Navigating uncharted waters. Language
Testing,
34
(4), 555–564.
Eguchi, M., & Kyle, K. (2020). Continuing
to explore the multidimensional nature of lexical sophistication: The case of oral proficiency
interviews. The Modern Language
Journal,
104
(2), 381–400.
Ellis, N. C. (2002). Frequency
effects in language processing. Studies in Second Language
Acquisition,
24
(2), 143–188.
Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic
language in native and second language speakers: Psycholinguistics, corpus linguistics, and
TESOL. TESOL
Quarterly,
42
(3), 375–396.
Fergadiotis, G., Wright, H. H., & West, T. M. (2013). Measuring
lexical diversity in narrative discourse of people with aphasia. American Journal of
Speech-Language Pathology / American Speech-Language-Hearing
Association,
22
(2).
Gablasova, D., & Bottini, R. (2022). Spoken
learner corpora for language teaching. In R. Jablonkai & E. Csomay (Eds.), The
Routledge handbook of corpora and English language teaching and
learning (pp. 296–310). Routledge.
Gablasova, D., Harding, L., Brezina, V., & Dunlea, J. (2023, July). Talking
to an imagined interlocutor: Interactional and interpersonal features of discourse in computer — mediated semi-direct speaking
assessment. Corpus Linguistics 2023 Conference, Lancaster
University (UK).
Gries, S. Th. (2015). The most under-used
statistical method in corpus linguistics: multi-level (and mixed-effects)
models. Corpora,
10
(1), 95–125.
Horst, M., & Collins, L. (2006). From
faible to strong: How does their vocabulary grow? The Canadian Modern Language
Review,
63
(1), 83–106.
Ishikawa, S. (2023). The
ICNALE guide: An introduction to a learner corpus study on Asian learners’ L2
English. Routledge.
Kim, M., Crossley, S., & Kyle, K. (2018). Lexical
sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing
quality. The Modern Language
Journal,
102
(1), 120–141.
Kormos, J. (2011). Task
complexity and linguistic and discourse features of narrative writing performance. Journal of
Second Language
Writing,
20
(2), 148–161.
Kyle, K., & Crossley, S. (2015). Automatically
assessing lexical sophistication: Indices, tools, findings, and application. TESOL
Quarterly,
49
(4), 757–786.
(2016). The
relationship between lexical sophistication and independent and source-based writing. Journal
of Second Language
Writing,
34
1, 12–24.
Kyle, K., Crossley, S., & Berger, C. (2018). The
tool for the automatic analysis of lexical sophistication (TAALES): version 2.0. Behavior
Research
Methods,
50
1, 1030–1046.
Le Foll, E. (2021). Register
variation in school EFL textbooks. Register
Studies,
3
(2), 207–246.
(2022a).
Making
tea and mistakes: The functions of make in spoken English and textbook
dialogues. In Z. Yin & E. Vine (Eds.), Multifunctionality
in English: Corpora, language and academic literacy
pedagogy (pp. 157–178). Routledge.
(2022b). Textbook
English: A corpus-based analysis of the language of EFL textbooks used in secondary schools in France, Germany and
Spain [PhD thesis, Osnabrück University].
Love, R., Dembry, C., Hardie, A., Brezina, V., & McEnery, T. (2017). The
spoken BNC2014. International Journal of Corpus
Linguistics,
22
(3), 319–344.
Lu, X. (2012). The
relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern
Language
Journal,
96
(2), 190–208.
McNamara, D., Crossley, S., & McCarthy, P. (2010). Linguistic
features of writing quality. Written
Communication,
27
(1), 57–86.
Monteiro, K., Crossley, S., & Kyle, K. (2020). In
search of new benchmarks: Using L2 lexical frequency and contextual diversity indices to assess second language
writing. Applied
Linguistics,
41
(2), 280–300.
Nakatsuhara, F., Khabbazbashi, N., & Inoue, C. (2021). Assessing
speaking. In G. Fulcher & L. Harding (Eds.), The
Routledge handbook of language
testing (pp. 209–222). Routledge.
Nesi, H. (2001). A
corpus-based analysis of academic lectures across
disciplines. In J. Cotterill & A. Ife (Eds.), Language
across
boundaries (pp. 201–218). British Association for Applied Linguistics in association with Continuum Press.
Nesi, H., & Gardner, S. (2012). Genres
across the disciplines: Student writing in higher education. Cambridge University Press.
Ockey, G. J., & Chukharev-Hudilainen, E. (2021). Human
versus computer partner in the paired oral discussion test. Applied
Linguistics,
42
(5), 924–944.
OED. (2023). sophistication, n. |
sophisticated, adj. In Oxford English
Dictionary. Oxford University Press; Oxford English Dictionary.
Ong, J., & Zhang, L. J. (2010). Effects
of task complexity on the fluency and lexical complexity in EFL students’ argumentative
writing. Journal of Second Language
Writing,
19
(4), 218–233.
Pallotti, G. (2020). Measuring
complexity, accuracy, and fluency (CAF). In P. Winke & T. Brunfaut (Eds.), The
Routledge handbook of Second Language Acquisition and language
testing (pp. 201–210). Routledge.
Paquot, M. (2019). The
phraseological dimension in interlanguage complexity research. Second Language
Research,
35
(1), 121–145.
Paquot, M., Gablasova, D., Brezina, V., & Naets, H. (2022). Phraseological
complexity in EFL learners’ spoken production across proficiency
levels. In S. Götz & A. Leńko-Szymańska (Eds.), Complexity,
accuracy and fluency in learner corpus
research (pp. 115–136). John Benjamins.
Saito, K., Suzuki, S., Oyama, T., & Akiyama, Y. (2021). How
does longitudinal interaction promote second language speech learning? Roles of learner experience and proficiency
levels. Second Language
Research,
37
(4), 547–571.
Salsbury, T., Crossley, S., & McNamara, D. (2011). Psycholinguistic
word information in second language oral discourse. Second Language
Research,
27
(3), 343–360.