Chapter 6. Modelling L2 vocabulary learning

Edwards, Roderick; Collins, Laura

doi:10.1075/sibil.47.08ch6

Part of

Vocabulary Knowledge: Human ratings and automated measures
Edited by Scott Jarvis and Michael Daller
[Studies in Bilingualism 47] 2013
► pp. 157–184

Chapter 6. Modelling L2 vocabulary learning

Roderick Edwards | University of Victoria, Victoria, British Columbia, Canada cand Concordia University, Montreal, Quebec, Canada.

Laura Collins | University of Victoria, Victoria, British Columbia, Canada cand Concordia University, Montreal, Quebec, Canada.

In this paper we propose a frequency-based model of vocabulary acquisition and test it on texts written by second language (L2) writers of English. One goal of the paper is to address an issue that has arisen in previous work attempting to verify Laufer and Nation’s (1995) proposal for using lexical frequency profiling tools with L2 texts to estimate the underlying vocabulary size of the L2 writers. That issue is the application of Zipf’s law (1935, 1949) directly to student texts (see Meara, 2005; Edwards & Collins, 2011), which assumes that words are learned in the order of their frequency in the language at large. As this is clearly not the case, a more valid model of vocabulary learning needs to account for the presence of less common words at different points of the acquisition process. Our model supposes that learning consists of a sequence of exposures to words, seen in proportion to their frequency in the language as a whole, and that some number of exposures are required for a word to be learned (a model parameter). This allows calculation of the probabilities that a given word (whether common or uncommon) is learned after a given number of exposures in this sequence. Furthermore, it allows calculation of the likelihood that a word is used once it has been learned, based on the word’s rank in the learner’s interlanguage (we also considered the possibility of basing this step on the word’s rank in the L2 as a whole), from which we can predict frequency distributions for learner texts. For a given 1K word count in texts, the model predicts a smaller underlying productive vocabulary than predicted by the naïve application of Zipf’s law. We then fit the parameters of the model to texts written by 90 francophone ESL learners at different points of a five-month intensive program. The best fit was obtained with a ‘number of exposures’ parameter value of 3. The model reproduces the steeper-than-Zipf tail of the frequency distribution of words observed in texts.

Published online: 14 August 2013

https://doi.org/10.1075/sibil.47.08ch6

Cited by

Cited by 2 other publications

HIDAKA, SHOHEI

2016. Estimating the latent number of types in growing corpora with reduced cost–accuracy trade-off. Journal of Child Language 43:1 ► pp. 107 ff.

Jarvis, Scott

2017. Grounding lexical diversity in human judgments. Language Testing 34:4 ► pp. 537 ff.

This list is based on CrossRef data as of 25 june 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.