Recently-developed tools which quickly and reliably quantify vocabulary use on a range of measures open up new
possibilities for understanding the construct of vocabulary sophistication. To take this work forward, we need to understand how
these different measures relate to each other and to human readers’ perceptions of texts. This study applied 356 quantitative
measures of vocabulary use generated by an automated vocabulary analysis tool (Kyle & Crossley, 2015) to a large corpus of
assignments written for First-Year Composition courses at a university in the United States. Results suggest that the majority of
measures can be reduced to a much smaller set without substantial loss of information. However, distinctions need to be retained
between measures based on content vs. function words and on different measures of collocational strength. Overall, correlations
with grades are reliable but weak.
(2014) Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing,
26
1, 28–41.
Biber, D.
(1988) Variation Across Speech and Writing. Cambridge: Cambridge University Press.
BNC Consortium
(2007) British National Corpus, version 3 (BNC XML ed.). Retrieved from [URL] (Last accessed February 2019).
Brown, G. D. A.
(1984) A frequency count of 190,000 words in the London-Lund corpus of English conversation. Behavior Research Methods, Instrumentation & Computers,
16
(6), 502–532.
Brysbaert, M., & New, B.
(2009) Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods,
41
(4), 977–990.
Bulté, B., & Housen, A.
(2014) Conceptualizing and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing,
26
1, 42–65.
Burnage, G.
(1990) CELEX: A Guide for Users. Nijmegen: CELEX – Centre for Lexical Information.
Coxhead, A.
(2000) A new academic wordlist. TESOL Quarterly,
34
(2), 213–238.
Crossley, S. A., Cai, Z., & McNamara, D.
(2012) Syntagmatic, paradigmatic, and automatic n-gram approaches to assessing essay quality. In G. M. Youngblood & P. M. McCarthy (Eds.), Proceedings of the Twenty-Fifth International Florida Artificial Intelligence Research Society Conference (pp. 214–219). Palo-Alto, CA: The AAAI Press.
Crossley, S. A., DeFore, C., Kyle, K., Dai, J., & McNamara, D.
(2013) Paragraph specific n-gram approaches to automatically assessing essay quality. In S. K. D’Mello, R. A. Clavo & A. Olney (Eds.), Proceedings of the 6th International Conference on Educational Data Mining (pp. 216–219). Heidelberg: Springer. Retrieved from [URL] (Last accessed February 2019)
Crossley, S. A., Salsbury, T., McNamara, D., & Jarvis, S.
(2010) Predicting lexical proficiency in language learner texts using computational indices. Language Testing,
28
(4), 561–580.
Crossley, S. A., Weston, J. L., Sullivan, S. T. M., & McNamara, D.
(2011) The development of writing proficiency as a function of grade level: A linguistic analysis. Written Communication,
28
(3), 282–311.
Cumming, A., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M.
(2005) Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing,
10
(1), 5–43.
Daller, H., Turlik, J., & Weir, I.
(2013) Vocabulary acquisition and the learning curve. In S. Jarvis & H. Daller (Eds.), Vocabulary Knowledge: Human Ratings and Automated Measures (pp. 185–215). Amsterdam/Philadelphia, PA: John Benjamins.
Davies, M.
(2008-) The Corpus of Contemporary American: 450 million words, 1990-present. Retrieved from [URL] (last accessed February 2019).
(in press). Corpus research on the development of children’s writing in L1 English. In A. Glaznieks, A. Abel, V. Lyding, & V. Nicolas(Eds.)Corpora and Language in Use: Proceedings of the Learner Corpus Research Conference2017 Louvain: Presses Universitaires de Louvain.
Durrant, P., & Schmitt, N.
(2009) To what extent do native and non-native writers make use of collocations?International Review of Applied Linguistics,
47
(2), 157–177.
Garner, J., Crossley, S. A., & Kyle, K.
(2018) Beginning and intermediate L2 writers’ use of N-grams: An association measures study. International Review of Applied Linguistics. Advance online publication.
Golub, L. S., & Frederick, W. C.
(1979) Linguistic Structures in the discourse of fourth and sixth graders. Madison, WI: Center for Cognitive Learning, The University of Wisconsin.
Graesser, A. C., McNamara, D., Louwerse, M. M., & Cai, Z.
(2014) Coh-Metrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers,
36
(2), 193–202.
Granger, S., & Bestgen, Y.
(2014) The use of collocations by intermediate vs. advanced non-ntive writers: A bigram-based study. International Review of Applied Linguistics,
52
(3), 229–252.
(1981) Syntactic maturity, mechanics, and vocabulary as predictors of quality ratings. Research in the Teaching of English,
15
(1), 75–85.
Guo, L., Crossley, S. A., & McNamara, D.
(2013) Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing,
18
(3), 218–238.
(2003) Exploring multiple profiles of highly rated learner compositions. Journal of Second Language Writing,
12
1, 377–403.
Kim, J.-Y.
(2014) Predicting L2 writing proficiency using linguistic complexity measures: A corpus-based study. English Teaching,
69
(4), 27–51.
Kim, M., Crossley, S. A., & Kyle, K.
(2018) Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. The Modern Language Journal,
102
(1), 120–141.
Knoch, U., Rouhshad, A., Oon, S. P., & Storch, N.
(2015) What happens to ESL students’ writing after three years of study at an English medium university?Journal of Second Language Writing,
28
1, 39–52.
Knoch, U., Rouhshad, A., & Storch, N.
(2014) Does the writing of undergraduate ESL students develop after one year of study in an English-medium university?Assessing Writing,
21
1, 1–17.
Kucera, H. & Francis, W.
(1967) Computational Analysis of Present-day American English. Providence, RI: Brown University Press.
Kyle, K.
(2017) Modelling quality in source-based texts. Retrieved from [URL] (last accessed February 2019).
(2016) The relationship between lexical sophistication and independent and source-based writing. Journal of Second Language Writing,
34
1, 12–24.
Malvern, D., & Richards, B.
(2002) Investigating accommodation in language proficiency interviews using a new measure of lexical diversity. Language Testing,
19
(1), 85–104.
Malvern, D., Richards, B. J., Chipere, N., & Durán, P.
(2004) Lexical Diversity and Language Development. Basingstoke: Palgrave Macmillan.
Massey, A. J., & Elliott, G. L.
(1996) Aspects of Writing in 16+ English Examinations Between 1980 & 1994. Cambridge: University of Cambridge Local Examinations Syndicate.
Massey, A. J., Elliott, G. L., & Johnson, N. K.
(2005) Variations in Aspects of Writing in 16+ English Examinations Between 1980 and 2004: Vocabulary, Spelling, Punctuation, Sentence Structure, Non-standard English. Cambridge: Cambridge Assessment.
Mazgutova, D., & Kormos, J.
(2015) Syntactic and lexical development in an intensive English for Academic Purposes programme. Journal of Second Language Writing,
29
1, 3–15.
McCarthy, P. M., & Jarvis, S.
(2011) MTLD, voc-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods,
42
(2), 381–392.
Meurers, D., & Dickinson, M.
(2017) Evidence and interpretation in language learning research: Opportunities for collaboration with computational linguistics. Language Learning,
67
:S1, 66–95.
Moxley, J.
(2013) Big data, learning analytics, and social assessment. Journal of Writing Assessment,
6
(1), 1–10.
Myhill, D.
(1999) Writing matters: Linguistic characteristics of writing in GCSE English examinations. English in Education,
33
(3), 70–81.
Myhill, D.
(2009) From talking to writing: Linguistic development in writing. BJEP Monograph Series II,
6
1, 27–44.
Olinghouse, N. G., & Leaird, J. T.
(2009) The relationship between measures of vocabulary and narrarive writing quality in second- and fourth-grade students. Reading and Writing,
22
1, 545–565.
Olinghouse, N. G., & Wilson, J.
(2013) The relationship between vocabulary and writing quality in three genres. Reading and Writing: An Interdisciplinary Journal,
26
1, 45–65.
Paquot, M.
(2018) Phraseological competence: A missing component in university entrance language tests? Insights from a study of EFL learners’ use of statistical collocations. Language Assessment Quarterly,
15
(1), 29–43.
Paquot, M.
(2019) The phraseological dimension in interlanguage complexity research. Second Language Research,
35
(1), 121–145.
R Development Core Team
(2013) R: A Language and Environment for Statistical Computing (Version 1.0.136) [Computer software]. Vienna: R Foundation for Statistical Computing. Retrieved from [URL] (last accessed February 2019).
Read, J.
(2000) Assessing Vocabulary. Cambridge: Cambridge University Press.
Roessingh, H., Elgie, S., & Kover, P.
(2015) Using lexical profiling tools to investigage children’s written vocabulary in grade 3: An exploratory study. Language Assessment Quarterly,
12
(1), 67–86.
Simpson-Vlach, R., & Ellis, N. C.
(2010) An Academic Formulas List: New methods in phraseology research. Applied Linguistics,
31
(4), 487–512.
Storch, N.
(2009) The impact of studying in a second language (L2) medium university on the development of L2 writing. Journal of Second Language Writing,
18
(2), 103–118.
Thorndike, E. L. & Lorge, I.
(1944) The Teacher’s Word Book of 30,000 Words. New York, NY: Teachers College, Columbia University.
Treffers-Daller, J., Parslow, P., & Williams, S.
(2018) Back to basics: How measures of lexical diversity can help discriminate between CEFR levels. Applied Linguistics,
39
(3), 302–327.
Uccelli, P., Dobbs, C. L., & Scott, J.
(2013) Mastering academic language: Organization and stance in the persuasive writing of high school students. Written Communication,
30
(1), 36–62.
Verspoor, M., Schmid, M. S., & Xu, X.
(2012) A dynamic usage based perspective on L2 writing. Journal of Second Language Writing,
21
(3), 239–263.
Vidakovic, I., & Barker, F.
(2010) Use of words and multi-word units in Skills for Life Writing examinations. University of Cambridge ESOL Examinations Research Notes,
41
1, 7–14.
Vieregge, Q., Stedman, K., Mitchell, T., & Moxley, J.
(2012) Agency in the Age of Peer Production. Urbana, IL: National Council of Teachers of English.
Cited by
Cited by 6 other publications
Kim, Minkyung
2021. Exploring longitudinal changes in lexical and syntactic features in beginning-level EFL learner writing. System 103 ► pp. 102680 ff.
Maamuujav, Undarmaa
2021. Examining lexical features and academic vocabulary use in adolescent L2 students’ text-based analytical essays. Assessing Writing 49 ► pp. 100540 ff.
McCallum, Lee & Philip Durrant
2022. Shaping Writing Grades,
Stewart, Jeffrey, Joseph P. Vitta, Christopher Nicklin, Stuart McLean, Geoffrey G. Pinchbeck & Brandon Kramer
2022. The Relationship between Word Difficulty and Frequency: A Response to Hashimoto (2021). Language Assessment Quarterly 19:1 ► pp. 90 ff.
Vitta, Joseph P., Christopher Nicklin & Simon W. Albright
2023. Academic word difficulty and multidimensional lexical sophistication: An English‐for‐academic‐purposes‐focused conceptual replication of Hashimoto and Egbert (2019). The Modern Language Journal
This list is based on CrossRef data as of 22 may 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.