Detmar Meurers | Department of Linguistics, University of Tübingen
We consider the opportunities presented by big educational learner corpora for Second Language Acquisition (SLA). In particular, we focus on the EF Cambridge Open Language Database (EFCAMDAT), an open access database of student writings submitted to Englishtown, the online school of EF Education First. EFCAMDAT stands out for its size (33 million words, 85 thousand learners) and a range of 128 writing tasks covering all CEFR levels with data from learners from varying nationalities. We discuss methodological issues arising from analyzing big data resources generated in educational contexts and argue that Natural Language Processing (NLP) is essential for the automated processing of such datasets. As a study case, we follow the developmental trajectory of relative clauses, a construction that necessitates deeper syntactic analysis. We consider specific issues that can affect the developmental trajectory, including task effects, formulaic language and national language effects.
2002An Introduction to Categorical Data Analysis 2. New York: John Wiley & Sons.
Bardovi-Harlig, K
2000Tense and Aspect in Second Language Acquisition: Form, Meaning and Use. Oxford: Blackwell.
Bley-Vroman, R
1989 “What is the logical problem of foreign language learning?”. In S.M. Gass and J. Schachter (Eds.), Linguistic Perspectives on Second Language Acquisition. New York: Cambridge University Press, 41–68.
Cambridge Learner Corpus
2009 Cambridge ESOL and Cambridge University Press. Available at [URL].
Church, K.W. & Hanks, P
1990 “Word association norms, mutual information, and lexicography”, Computational Linguistics 16(1), 22–29.
Clark, S. & Curran, J.R
2007 “Wide-coverage efficient statistical parsing with CCG and log-linear models”, Computational Linguistics 33(4), 493–552.
Council of Europe
2001Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: Cambridge University Press.
2005 “What makes learning second language grammar difficult? A review of issues”, Language Learning 55, S1, 1–25.
Dulay, H., Burt, M. & Krashen, S
1982Language Two. New York: Oxford University Press.
Ellis, N.C
2010 “Construction learning as category learning”. In M. Pütz & L. Sicola (Eds.), Cognitive Processing and Second Language Acquisition: Inside the Learner’s Mind. John Benjamins, 27–48.
Feldweg, H
1991The European Science Foundation Second Language Database. Nijmegen: Max Planck Institute for Psycholinguistics.
Fillmore, L.W
1979 “Individual differences in second language acquisition”. In C. Fillmore, D. Kempler & W.S.-Y. Wang (Eds.), Individual Differences in Language Ability and Language Behavior. New York: Academic Press, 203–228.
Flynn, S., Foley, C. & Vinnitskaya, I
2004 “The cumulative enhancement model for language acquisition: comparing adults’ and children’s patterns of development in first, second and third language acquisition of relative clauses”, The International Journal of Multilingualism 1(1), 3–16.
Geertzen, J., Alexopoulou, T., Baker, R., Hendriks, H., Jiang, S. & Korhonen, A
2013aThe EF Cambridge Open Language Database (EFCAMDAT): User Manual Part I: Writtings. Available at [URL]. (accessed 19 November 2014).
Geertzen, J., Alexopoulou, T. & Korhonen, A
2013b “Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCAMDAT)”. In R.T. Miller, K.I. Martin, C.M. Eddingon, A. Henery, N. Marcos Miguel, A.M. Tseng, A. Tuninetti & D. Walter (Eds.), Proceedings of the 31st Second Language Research Forum (SLRF), Carnegie Mellon. Cascadilla Proceedings Project, 240–254.
Granger, S
1998Learner English on Computer. London: Longman.
Granger, S
2008 “Learner corpora”. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook. Berlin and New York: Walter de Gruyter, 259–275.
Granger, S., Dagneaux, E. & Meunier, F
2002International Corpus of Learner English. Louvain-la-Neuve: Presses Universitaires de Louvain.
Granger, S., Dagneaux, E., Meunier, F. & Paquot, M
2009International Corpus of Learner English. Version 2 (Handbook + CD-ROM). Louvain-la-Neuve: Presses universitaires de Louvain.
Granger, S., Kraif, O., Ponton, C., Antoniadis, G. & Zampa, V
2007 “Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness”, ReCaLL 19(3), 252–268.
Hockenmaier, J. & Steedman, M
2007 “CCGbank: a corpus of CCG derivations and dependency structures extracted from the Penn Treebank”, Computational Linguistics 33(3), 355–396.
Lardiere, D
1998 “Dissociating syntax from morphology in a divergent L2 end-state grammar”, Second Language Research 14(4), 359–375.
2013 “Tracking learners’ progress: adopting a dual corpus cum experimental data approach”, Modern Language Journal 971, 61–76.
Meurers, D
2009 “On the automatic analysis of learner language”, CALICO Journal 26(3), 469–473.
Miller, G.A
1995 “WordNet: a lexical database for English”, Communications of the ACM 38(11), 39–41.
Murakami, A
2013L1 Influence and Individual Variation in the L2 Accuracy Development of Grammatical Morphemes: Insights from Learner Corpora. Unpublished doctoral dissertation, University of Cambridge, UK.
Myles, F
2008 “Investigating learner language development with electronic longitudinal corpora: Theoretical and methodological issues”. In L. Ortega and H. Byrnes (Eds.), The longitudinal Study of Advanced L2 Capacities. New York and London: Routledge, 58–72.
1993Adult Language Acquisition: Volume I: Field Methods. Cambridge University Press.
Rimell, L., Clark, S. & Steedman, M
2009 “Unbounded dependency recovery for parser evaluation”. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2. Association for Computational Linguistics, 813–821.
Robinson, P. and Ellis, N.C
2008Handbook of Cognitive Linguistics and Second Language Acquisition. London and New York: Routledge.
Schachter, J
1974 “An error in error analysis”, Language Learning 241, 205–214.
Selinker, L
1972 “Interlanguage”, International Review of Applied Linguistics in Language Teaching 10(1–4), 209–232.
Shirai, Y. & Ozeki, H
2007 “Introduction to the special issue: The acquisition of relative clauses and the noun phrase accessibility hierarchy: a universal in SLA?”, Studies in Second Language Acquisition 291, 55–167.
Sinclair, J
2005 “How to build a corpus”. In M. Wynne (Ed.), Developing Linguistic Corpora: A Guide to Good Practice, Oxford: Oxbow Books, 79–83.
Steedman, M
2000The Syntactic Process. Cambridge: MIT Press.
Tavakoli, P. & Foster, P
2008 “Task design and second language performance: the effect of narrative type on learner output”, Language Learning 58(2), 439–473.
Team, R.C
2008R: a language and environment for statistical computing. Vienna: Foundation for Statistical Computing.
Tizón-Couto, B
2013Clausal Complements in Native and Learner Spoken English. A Corpus-based Study with LINDSEI and VICOLSE. Bern: Peter Lang.
Vyatkina, N
2012 “The development of second language writing complexity in groups and individuals: A longitudinal learner corpus study”, The Modern Language Journal 961, 576–598.
2002Formulaic Language and the Lexicon. New York: Cambridge University Press.
Wulff, S., Ellis, N.C., Römer, U., Bardovi-Harlig, K. & LeBlanc, C
2009 “The acquisition of tense-aspect: Converging evidence from corpora and telicity readings”, Modern Language Journal 931, 354–369.
Wulff, S., Lester, N. & Martinez-Garcia, M.T
2014 “That-variation in German and Spanish L2 English”, Language and Cognition 61, 271–299.
Cited by
Cited by 20 other publications
Alexopoulou, Theodora, Marije Michel, Akira Murakami & Detmar Meurers
2017. Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques. Language Learning 67:S1 ► pp. 180 ff.
Azazil, Lina
2020. Frequency effects in the L2 acquisition of the catenative verb construction – evidence from experimental and corpus data
. Cognitive Linguistics 31:3 ► pp. 417 ff.
2021. Automatic extraction of subordinate clauses and its application in second language acquisition research. Behavior Research Methods 53:2 ► pp. 803 ff.
2020. Learner Corpora. In A Practical Handbook of Corpus Linguistics, ► pp. 283 ff.
Meurers, Detmar
2021. Natural Language Processing and Language Learning. In The Encyclopedia of Applied Linguistics, ► pp. 1 ff.
Meurers, Detmar & Markus Dickinson
2017. Evidence and Interpretation in Language Learning Research: Opportunities for Collaboration With Computational Linguistics. Language Learning 67:S1 ► pp. 66 ff.
Murakami, Akira
2016. Modeling Systematicity and Individuality in Nonlinear Second Language Development: The Case of English Grammatical Morphemes. Language Learning 66:4 ► pp. 834 ff.
Naismith, Ben, Alan Juffs, Na-Rae Han & Daniel Zheng
2022. Handle it in-house?. International Journal of Corpus Linguistics 27:3 ► pp. 291 ff.
O'Keeffe, Anne & Geraldine Mark
2022. Principled pattern curation to guide data-driven learning design. Applied Corpus Linguistics 2:3 ► pp. 100028 ff.
Ruggia, Simona & Thomas Gaillat
2023. Les corpus numériques pour la didactique des langues : de la formation des enseignants à l’élaboration de dispositifs d’apprentissage . Corpus :24
2018. “Stop doing this at once!”: The preferred use of modality for advice-giving by English language learners. Intercultural Pragmatics 15:3 ► pp. 349 ff.
This list is based on CrossRef data as of 22 may 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.