Detmar Meurers | Department of Linguistics, University of Tübingen
We consider the opportunities presented by big educational learner corpora for Second Language Acquisition (SLA). In particular, we focus on the EF Cambridge Open Language Database (EFCAMDAT), an open access database of student writings submitted to Englishtown, the online school of EF Education First. EFCAMDAT stands out for its size (33 million words, 85 thousand learners) and a range of 128 writing tasks covering all CEFR levels with data from learners from varying nationalities. We discuss methodological issues arising from analyzing big data resources generated in educational contexts and argue that Natural Language Processing (NLP) is essential for the automated processing of such datasets. As a study case, we follow the developmental trajectory of relative clauses, a construction that necessitates deeper syntactic analysis. We consider specific issues that can affect the developmental trajectory, including task effects, formulaic language and national language effects.
Agresti, A. 2002. An Introduction to Categorical Data Analysis 2. New York: John Wiley & Sons.
Bardovi-Harlig, K. 2000. Tense and Aspect in Second Language Acquisition: Form, Meaning and Use. Oxford: Blackwell.
Bley-Vroman, R. 1989. “What is the logical problem of foreign language learning?”. In S.M. Gass and J. Schachter (Eds.), Linguistic Perspectives on Second Language Acquisition. New York: Cambridge University Press, 41–68.
Cambridge Learner Corpus. 2009. Cambridge ESOL and Cambridge University Press. Available at [URL].
Church, K.W. & Hanks, P. 1990. “Word association norms, mutual information, and lexicography”, Computational Linguistics 16(1), 22–29.
Clark, S. & Curran, J.R. 2007. “Wide-coverage efficient statistical parsing with CCG and log-linear models”, Computational Linguistics 33(4), 493–552.
Council of Europe2001. Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: Cambridge University Press.
DeKeyser, R.M. 2005. “What makes learning second language grammar difficult? A review of issues”, Language Learning 55, S1, 1–25.
Dulay, H., Burt, M. & Krashen, S. 1982. Language Two. New York: Oxford University Press.
Ellis, N.C. 2010. “Construction learning as category learning”. In M. Pütz & L. Sicola (Eds.), Cognitive Processing and Second Language Acquisition: Inside the Learner’s Mind. John Benjamins, 27–48.
Feldweg, H. 1991. The European Science Foundation Second Language Database. Nijmegen: Max Planck Institute for Psycholinguistics.
Fillmore, L.W. 1979. “Individual differences in second language acquisition”. In C. Fillmore, D. Kempler & W.S.-Y. Wang (Eds.), Individual Differences in Language Ability and Language Behavior. New York: Academic Press, 203–228.
Flynn, S., Foley, C. & Vinnitskaya, I. 2004. “The cumulative enhancement model for language acquisition: comparing adults’ and children’s patterns of development in first, second and third language acquisition of relative clauses”, The International Journal of Multilingualism 1(1), 3–16.
Geertzen, J., Alexopoulou, T., Baker, R., Hendriks, H., Jiang, S. & Korhonen, A. 2013a. The EF Cambridge Open Language Database (EFCAMDAT): User Manual Part I: Writtings. Available at [URL]. (accessed 19 November 2014).
Geertzen, J., Alexopoulou, T. & Korhonen, A. 2013b. “Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCAMDAT)”. In R.T. Miller, K.I. Martin, C.M. Eddingon, A. Henery, N. Marcos Miguel, A.M. Tseng, A. Tuninetti & D. Walter (Eds.), Proceedings of the 31st Second Language Research Forum (SLRF), Carnegie Mellon. Cascadilla Proceedings Project, 240–254.
Granger, S. 1998. Learner English on Computer. London: Longman.
Granger, S. 2008. “Learner corpora”. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook. Berlin and New York: Walter de Gruyter, 259–275.
Granger, S., Dagneaux, E. & Meunier, F. 2002. International Corpus of Learner English. Louvain-la-Neuve: Presses Universitaires de Louvain.
Granger, S., Dagneaux, E., Meunier, F. & Paquot, M. 2009. International Corpus of Learner English. Version 2 (Handbook + CD-ROM). Louvain-la-Neuve: Presses universitaires de Louvain.
Granger, S., Kraif, O., Ponton, C., Antoniadis, G. & Zampa, V. 2007. “Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness”, ReCaLL 19(3), 252–268.
Hockenmaier, J. & Steedman, M. 2007. “CCGbank: a corpus of CCG derivations and dependency structures extracted from the Penn Treebank”, Computational Linguistics 33(3), 355–396.
Lardiere, D. 1998. “Dissociating syntax from morphology in a divergent L2 end-state grammar”, Second Language Research 14(4), 359–375.
Meunier, F. and Littré, D. 2013. “Tracking learners’ progress: adopting a dual corpus cum experimental data approach”, Modern Language Journal 971, 61–76.
Meurers, D. 2009. “On the automatic analysis of learner language”, CALICO Journal 26(3), 469–473.
Miller, G.A. 1995. “WordNet: a lexical database for English”, Communications of the ACM 38(11), 39–41.
Murakami, A. 2013. L1 Influence and Individual Variation in the L2 Accuracy Development of Grammatical Morphemes: Insights from Learner Corpora. Unpublished doctoral dissertation, University of Cambridge, UK.
Myles, F. 2008. “Investigating learner language development with electronic longitudinal corpora: Theoretical and methodological issues”. In L. Ortega and H. Byrnes (Eds.), The longitudinal Study of Advanced L2 Capacities. New York and London: Routledge, 58–72.
Perdue, C. 1993. Adult Language Acquisition: Volume I: Field Methods. Cambridge University Press.
Rimell, L., Clark, S. & Steedman, M. 2009. “Unbounded dependency recovery for parser evaluation”. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2. Association for Computational Linguistics, 813–821.
Robinson, P. and Ellis, N.C. 2008. Handbook of Cognitive Linguistics and Second Language Acquisition. London and New York: Routledge.
Schachter, J. 1974. “An error in error analysis”, Language Learning 241, 205–214.
Selinker, L. 1972. “Interlanguage”, International Review of Applied Linguistics in Language Teaching 10(1–4), 209–232.
Shirai, Y. & Ozeki, H. 2007. “Introduction to the special issue: The acquisition of relative clauses and the noun phrase accessibility hierarchy: a universal in SLA?”, Studies in Second Language Acquisition 291, 55–167.
Sinclair, J. 2005. “How to build a corpus”. In M. Wynne (Ed.), Developing Linguistic Corpora: A Guide to Good Practice, Oxford: Oxbow Books, 79–83.
Steedman, M. 2000. The Syntactic Process. Cambridge: MIT Press.
Tavakoli, P. & Foster, P. 2008. “Task design and second language performance: the effect of narrative type on learner output”, Language Learning 58(2), 439–473.
Team, R.C. 2008. R: a language and environment for statistical computing. Vienna: Foundation for Statistical Computing.
Tizón-Couto, B. 2013. Clausal Complements in Native and Learner Spoken English. A Corpus-based Study with LINDSEI and VICOLSE. Bern: Peter Lang.
Vyatkina, N. 2012. “The development of second language writing complexity in groups and individuals: A longitudinal learner corpus study”, The Modern Language Journal 961, 576–598.
Wray, A. 2002. Formulaic Language and the Lexicon. New York: Cambridge University Press.
Wulff, S., Ellis, N.C., Römer, U., Bardovi-Harlig, K. & LeBlanc, C. 2009. “The acquisition of tense-aspect: Converging evidence from corpora and telicity readings”, Modern Language Journal 931, 354–369.
Wulff, S., Lester, N. & Martinez-Garcia, M.T. 2014. “That-variation in German and Spanish L2 English”, Language and Cognition 61, 271–299.
2024. Analysis of verb argument constructions (VACs) in L2 learners across proficiency levels: A corpus-based study in L1 Indonesian. Applied Corpus Linguistics 4:3 ► pp. 100097 ff.
Liu, Yingying & Xiaofei Lu
2024. Development of verb argument constructions in L2 English learners: A close replication of research question 3 in Römer and Berger (2019). Studies in Second Language Acquisition► pp. 1 ff.
2024. Triangulating learner corpus and online experimental data: Evidence from gender agreement and relative clauses in L2 Greek. The Modern Language Journal
2024. The potential influence of cross-linguistic lexical similarity on lexical diversity in L2 English writing. Corpora 19:2 ► pp. 131 ff.
Derkach, Kateryna & Theodora Alexopoulou
2023. Definite and indefinite article accuracy in learner English: A multifactorial analysis. Studies in Second Language Acquisition► pp. 1 ff.
Ruggia, Simona & Thomas Gaillat
2023. Les corpus numériques pour la didactique des langues : de la formation des enseignants à l’élaboration de dispositifs d’apprentissage . Corpus :24
2021. Automatic extraction of subordinate clauses and its application in second language acquisition research. Behavior Research Methods 53:2 ► pp. 803 ff.
Meurers, Detmar
2021. Natural Language Processing and Language Learning. In The Encyclopedia of Applied Linguistics, ► pp. 1 ff.
Azazil, Lina
2020. Frequency effects in the L2 acquisition of the catenative verb construction – evidence from experimental and corpus data
. Cognitive Linguistics 31:3 ► pp. 417 ff.
Gilquin, Gaëtanelle
2020. Learner Corpora. In A Practical Handbook of Corpus Linguistics, ► pp. 283 ff.
2022. Applied corpus linguistics for language acquisition, pedagogy, and beyond. Language Teaching 55:2 ► pp. 233 ff.
Römer, Ute & Cynthia M. Berger
2019. OBSERVING THE EMERGENCE OF CONSTRUCTIONAL KNOWLEDGE. Studies in Second Language Acquisition 41:5 ► pp. 1089 ff.
Zalaltdinova, Liya
2018. “Stop doing this at once!”: The preferred use of modality for advice-giving by English language learners. Intercultural Pragmatics 15:3 ► pp. 349 ff.
Alexopoulou, Theodora, Marije Michel, Akira Murakami & Detmar Meurers
2017. Task Effects on Linguistic Complexity and Accuracy: A Large‐Scale Learner Corpus Analysis Employing Natural Language Processing Techniques. Language Learning 67:S1 ► pp. 180 ff.
Meurers, Detmar & Markus Dickinson
2017. Evidence and Interpretation in Language Learning Research: Opportunities for Collaboration With Computational Linguistics. Language Learning 67:S1 ► pp. 66 ff.
2016. Modeling Systematicity and Individuality in Nonlinear Second Language Development: The Case of English Grammatical Morphemes. Language Learning 66:4 ► pp. 834 ff.
This list is based on CrossRef data as of 17 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.