Mapping of American English vocabulary by grade levels
We describe a large-scale effort to map English-language vocabulary by U.S. school grade levels. Our motivation is to rapidly expand graded vocabulary resources for work with native English speakers in the USA, while taking into consideration school-related influences rather than relying on just the corpus-frequency approaches. We report on the initial effort of data collection, with mapping of about 22K word forms. We provide comparisons of this mapping to some other recent vocabulary mapping efforts, such as age-of-acquisition. We then describe the efforts to automatically expand this resource by using linguistically motivated variables and corpus-based methods. Our current resource maps more than 126K English word forms to US school grade levels. We also compare a subset of our L1 mapped data to English L2 vocabulary levels, as expressed on the CEFR scale, and find that there is a considerable overlap in the order of vocabulary learning in L1 and L2 English.
Article outline
- Introduction
- Related work
- Method
- Data Collection
- Comparing VXGL and AoA
- Prediction
- Associative Estimate of Grade Level
- Results
- Comparison with CEFR mapping
- Discussion
- Conclusion
- Notes
-
References