Chapter published in:Vocabulary Knowledge: Human ratings and automated measures
Edited by Scott Jarvis and Michael Daller
[Studies in Bilingualism 47] 2013
► pp. 157–184
Chapter 6. Modelling L2 vocabulary learning
In this paper we propose a frequency-based model of vocabulary acquisition and test it on texts written by second language (L2) writers of English. One goal of the paper is to address an issue that has arisen in previous work attempting to verify Laufer and Nation’s (1995) proposal for using lexical frequency profiling tools with L2 texts to estimate the underlying vocabulary size of the L2 writers. That issue is the application of Zipf’s law (1935, 1949) directly to student texts (see Meara, 2005; Edwards & Collins, 2011), which assumes that words are learned in the order of their frequency in the language at large. As this is clearly not the case, a more valid model of vocabulary learning needs to account for the presence of less common words at different points of the acquisition process. Our model supposes that learning consists of a sequence of exposures to words, seen in proportion to their frequency in the language as a whole, and that some number of exposures are required for a word to be learned (a model parameter). This allows calculation of the probabilities that a given word (whether common or uncommon) is learned after a given number of exposures in this sequence. Furthermore, it allows calculation of the likelihood that a word is used once it has been learned, based on the word’s rank in the learner’s interlanguage (we also considered the possibility of basing this step on the word’s rank in the L2 as a whole), from which we can predict frequency distributions for learner texts. For a given 1K word count in texts, the model predicts a smaller underlying productive vocabulary than predicted by the naïve application of Zipf’s law. We then fit the parameters of the model to texts written by 90 francophone ESL learners at different points of a five-month intensive program. The best fit was obtained with a ‘number of exposures’ parameter value of 3. The model reproduces the steeper-than-Zipf tail of the frequency distribution of words observed in texts.
Published online: 14 August 2013
Cited by 2 other publications
This list is based on CrossRef data as of 25 april 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.