Article In:
International Journal of Learner Corpus Research: Online-First ArticlesThe more proficient the learners, the less sophisticated their L2 vocabulary?
The curious effect of the reference corpus on mean-frequency measures of lexical sophistication
Mean-frequency scores of lexical sophistication are used to evaluate written and spoken language production. They are calculated using word frequencies extracted from a reference corpus. Using mixed-effects regression models, we analyse the strength of the relationship between L2 proficiency and mean-frequency scores in spoken and written texts using reference corpora representing different modes and registers. We control for task and topic effects. We observe that mean-frequency measures of lexical sophistication are considerably more influenced by the mode and register of the reference corpus used to calculate these scores than by language users’ proficiency level. Advanced language users produce more frequent vocabulary, typical of the target register, in both spoken monologues and written essays. These results provide evidence in favour of a conceptual and terminological shift from lexical sophistication to register appropriateness (as suggested by Durrant & Brenchley, 2019) to refer to the construct captured by mean-frequency scores of vocabulary use.
Keywords: lexical complexity, L2 proficiency, reference corpora, register appropriateness, language testing
Article outline
- 1.Introduction
- 2.The problem
- 3.Hypotheses
- 4.Methodology
- 4.1Data
- 4.2Procedure
- 5.Results
- 5.1Results from the ICNALE spoken monologues
- 5.2Results from the ICNALE Written Essays
- 6.Discussion
- 6.1Hypothesis 1: The relationship between L2 proficiency and mean-frequency scores of content words (CW)
- 6.2Hypothesis 2: The effect of mode and register of the reference corpus on mean-frequency scores of content words (CW)
- 6.2.1The ICNALE spoken monologues
- 6.2.2The ICNALE Written Essays
- 6.3Limitations and future research
- 7.Conclusions
- Open data badge and open code badge
- Notes
- Author queries
-
References
This content is being prepared for publication; it may be subject to changes.
References (62)
Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English lexicon project. Behavior Research Methods,
39
(3), 445–459.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software,
67
(1), 1–48.
Bell, H. M. (2003). Using frequency lists to assess L2 texts [PhD thesis, University of Wales]. [URL]
Berger, C., Crossley, S., & Skalicky, S. (2019). Using lexical features to investigate second language lexical decision performance. Studies in Second Language Acquisition,
41
(5), 911–935.
Biber, D. (2023). Writing and speaking. In R. Horowitz (Ed.), The Routledge International Handbook of Research on Writing (2nd ed., pp. 535–548). Routledge.
Biber, D., Johansson, S., Leech, G. N., Conrad, S., & Finegan, E. (2021). Grammar of spoken and written English. John Benjamins.
BNC Consortium. (2007). The British National Corpus (XML) [Data set]. Oxford Text Archive. [URL]
Bottini, R. (2022). Lexical complexity in L2 English speech: Evidence from the Trinity Lancaster Corpus [PhD thesis]. Lancaster University.
Brezina, V., Hawtin, A., & McEnery, T. (2021). The written British National Corpus 2014 — design and comparability. Text & Talk,
41
(5–6), 595–615.
Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods,
41
(4), 977–990.
Council of Europe (Ed.). (2020). Common European framework of reference for languages: Learning, teaching, assessment. Companion volume. Council of Europe Publishing. [URL]
Crossley, S., Cobb, T., & McNamara, D. (2013). Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications. System,
41
(4), 965–981.
Crossley, S., Kyle, K., & Römer, U. (2019). Examining lexical and cohesion differences in discipline-specific writing using multi-dimensional analysis. In T. B. Sardinha & M. V. Pinto (Eds.), Multi-dimensional analysis: Research methods and current issues (pp. 189–216). Bloomsbury Academic.
Crossley, S., & McNamara, D. (2012). Predicting second language writing proficiency: the roles of cohesion and linguistic sophistication. Journal of Research in Reading,
35
(2), 115–135.
(2013). Applications of text analysis tools for spoken response grading. Language Learning & Technology,
17
(2), 171–192.
Crossley, S., Salsbury, T., & McNamara, D. (2010). The development of polysemy and frequency use in English second language speakers: Polysemy and frequency use in English L2 speakers. Language Learning,
60
(3), 573–605.
(2012). Predicting the proficiency level of language learners using lexical indices. Language Testing,
29
(2), 243–263.
Crossley, S., Salsbury, T., McNamara, D., & Jarvis, S. (2011). What is lexical proficiency? Some answers from computational models of speech data. TESOL Quarterly,
45
(1), 182–193.
Dawson, N., Hsiao, Y., Tan, A., Banerji, N., & Nation, K. (2021). Features of lexical richness in children’s books: Comparisons with child-directed speech.
Dombi, J., Sydorenko, T., & Timpe-Laughlin, V. (2022). Common ground, cooperation, and recipient design in human-computer interactions. Journal of Pragmatics,
193
1, 4–20.
Durrant, P. (2014). Corpus frequency and second language learners’ knowledge of collocations: A meta-analysis. International Journal of Corpus Linguistics,
19
(4), 443–477.
Durrant, P., & Brenchley, M. (2019). Development of vocabulary sophistication across genres in English children’s writing. Reading and Writing,
32
(8), 1927–1953.
Durrant, P., & Durrant, A. (2022). Appropriateness as an aspect of lexical richness: What do quantitative measures tell us about children’s writing? Assessing Writing,
51
1, 100596.
Durrant, P., Moxley, J., & McCallum, L. (2019). Vocabulary sophistication in First-Year Composition assignments. International Journal of Corpus Linguistics,
24
(1), 33–66.
Egbert, J. (2017). Corpus linguistics and language testing: Navigating uncharted waters. Language Testing,
34
(4), 555–564.
Eguchi, M., & Kyle, K. (2020). Continuing to explore the multidimensional nature of lexical sophistication: The case of oral proficiency interviews. The Modern Language Journal,
104
(2), 381–400.
Ellis, N. C. (2002). Frequency effects in language processing. Studies in Second Language Acquisition,
24
(2), 143–188.
Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly,
42
(3), 375–396.
Fergadiotis, G., Wright, H. H., & West, T. M. (2013). Measuring lexical diversity in narrative discourse of people with aphasia. American Journal of Speech-Language Pathology / American Speech-Language-Hearing Association,
22
(2).
Gablasova, D., & Bottini, R. (2022). Spoken learner corpora for language teaching. In R. Jablonkai & E. Csomay (Eds.), The Routledge handbook of corpora and English language teaching and learning (pp. 296–310). Routledge.
Gablasova, D., Harding, L., Brezina, V., & Dunlea, J. (2023, July). Talking to an imagined interlocutor: Interactional and interpersonal features of discourse in computer — mediated semi-direct speaking assessment. Corpus Linguistics 2023 Conference, Lancaster University (UK).
Gries, S. Th. (2015). The most under-used statistical method in corpus linguistics: multi-level (and mixed-effects) models. Corpora,
10
(1), 95–125.
Horst, M., & Collins, L. (2006). From faible to strong: How does their vocabulary grow? The Canadian Modern Language Review,
63
(1), 83–106.
Ishikawa, S. (2023). The ICNALE guide: An introduction to a learner corpus study on Asian learners’ L2 English. Routledge.
Kim, M., Crossley, S., & Kyle, K. (2018). Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. The Modern Language Journal,
102
(1), 120–141.
Kormos, J. (2011). Task complexity and linguistic and discourse features of narrative writing performance. Journal of Second Language Writing,
20
(2), 148–161.
Kyle, K., & Crossley, S. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly,
49
(4), 757–786.
(2016). The relationship between lexical sophistication and independent and source-based writing. Journal of Second Language Writing,
34
1, 12–24.
Kyle, K., Crossley, S., & Berger, C. (2018). The tool for the automatic analysis of lexical sophistication (TAALES): version 2.0. Behavior Research Methods,
50
1, 1030–1046.
Le Foll, E. (2021). Register variation in school EFL textbooks. Register Studies,
3
(2), 207–246.
(2022a).
Making tea and mistakes: The functions of make in spoken English and textbook dialogues. In Z. Yin & E. Vine (Eds.), Multifunctionality in English: Corpora, language and academic literacy pedagogy (pp. 157–178). Routledge.
(2022b). Textbook English: A corpus-based analysis of the language of EFL textbooks used in secondary schools in France, Germany and Spain [PhD thesis, Osnabrück University].
Love, R., Dembry, C., Hardie, A., Brezina, V., & McEnery, T. (2017). The spoken BNC2014. International Journal of Corpus Linguistics,
22
(3), 319–344.
Lu, X. (2012). The relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern Language Journal,
96
(2), 190–208.
McNamara, D., Crossley, S., & McCarthy, P. (2010). Linguistic features of writing quality. Written Communication,
27
(1), 57–86.
Monteiro, K., Crossley, S., & Kyle, K. (2020). In search of new benchmarks: Using L2 lexical frequency and contextual diversity indices to assess second language writing. Applied Linguistics,
41
(2), 280–300.
Nakatsuhara, F., Khabbazbashi, N., & Inoue, C. (2021). Assessing speaking. In G. Fulcher & L. Harding (Eds.), The Routledge handbook of language testing (pp. 209–222). Routledge.
Nesi, H. (2001). A corpus-based analysis of academic lectures across disciplines. In J. Cotterill & A. Ife (Eds.), Language across boundaries (pp. 201–218). British Association for Applied Linguistics in association with Continuum Press.
Nesi, H., & Gardner, S. (2012). Genres across the disciplines: Student writing in higher education. Cambridge University Press.
Ockey, G. J., & Chukharev-Hudilainen, E. (2021). Human versus computer partner in the paired oral discussion test. Applied Linguistics,
42
(5), 924–944.
OED. (2023). sophistication, n. | sophisticated, adj. In Oxford English Dictionary. Oxford University Press; Oxford English Dictionary.
Ong, J., & Zhang, L. J. (2010). Effects of task complexity on the fluency and lexical complexity in EFL students’ argumentative writing. Journal of Second Language Writing,
19
(4), 218–233.
Pallotti, G. (2020). Measuring complexity, accuracy, and fluency (CAF). In P. Winke & T. Brunfaut (Eds.), The Routledge handbook of Second Language Acquisition and language testing (pp. 201–210). Routledge.
Paquot, M. (2019). The phraseological dimension in interlanguage complexity research. Second Language Research,
35
(1), 121–145.
Paquot, M., Gablasova, D., Brezina, V., & Naets, H. (2022). Phraseological complexity in EFL learners’ spoken production across proficiency levels. In S. Götz & A. Leńko-Szymańska (Eds.), Complexity, accuracy and fluency in learner corpus research (pp. 115–136). John Benjamins.
Saito, K., Suzuki, S., Oyama, T., & Akiyama, Y. (2021). How does longitudinal interaction promote second language speech learning? Roles of learner experience and proficiency levels. Second Language Research,
37
(4), 547–571.
Salsbury, T., Crossley, S., & McNamara, D. (2011). Psycholinguistic word information in second language oral discourse. Second Language Research,
27
(3), 343–360.