From early to future learner corpus research

Granger, Sylviane

doi:10.1075/ijlcr.00050.gra

Article published In:

International Journal of Learner Corpus Research
Vol. 10:2 (2024) ► pp.247–279

From early to future learner corpus research

Sylviane Granger | University of Louvain

The aim of this article is to survey the field of learner corpus research from its origins to the present day and to provide some future perspectives. Key aspects of the field — learner corpus design and collection, learner corpus methodology, statistical analysis, research focus and links with related fields, in particular SLA, FLT and NLP — are compared in first-generation LCR, which extends from the late 1980s to 2000, and second-generation LCR, which covers the period from the early 2000s until today. The survey shows that the field has undergone major theoretical and methodological changes and considerably extended its range of applications. Future developments that are likely to gain ground are grouped into three categories: increased diversity, increased interdisciplinarity and increased automation.

Keywords: learner corpus research, second language acquisition, foreign language teaching, natural language processing

Article outline

1.Introduction
2.First-generation LCR
- 2.1Learner corpus design and collection
- 2.2Learner corpus methodology
  - 2.2.1Two main methodological approaches
  - 2.2.2Learner corpus annotation
  - 2.2.3Statistical analysis
- 2.3Research focus
- 2.4Links with SLA and FLT
3.Second-generation LCR
- 3.1Learner corpus collection
- 3.2Learner corpus design
- 3.3Learner corpus methodology
  - 3.3.1Two main methodological approaches
  - 3.3.2Learner corpus annotation
  - 3.3.3Statistical analysis
- 3.4Research focus
- 3.5Links with SLA, FLT and NLP
  - 3.5.1SLA
  - 3.5.2FLT
  - 3.5.3Natural language processing
4.Future LCR
- 4.1Increased diversity
- 4.2Increased interdisciplinarity
- 4.3Increased automation
5.Conclusion
Notes
References

Published online: 29 October 2024

https://doi.org/10.1075/ijlcr.00050.gra

References (192)

References

Aarts, J., & Granger, S. (1998). Tag sequences in learner corpora: A key to interlanguage grammar and discourse. In: S. Granger (Ed.), Learner English on computer (pp. 132–141). Addison Wesley Longman.

Ädel, A. (2008). Involvement features in writing: Do time and interaction trump register awareness? In G. Gilquin, S. Papp, & M. B. Díez-Bedmar (Eds.), Linking up contrastive and Learner Corpus Research (pp. 35–53). Rodopi.

Akbaş, E., & Dinçer, Z. O. (2021). Accuracy order in L2 grammatical morphemes: Corpus evidence from different proficiency levels of Turkish learners of English. Studies in Second Language Learning and Teaching, 11 (4), 607–627.

Alfaifi, A., Atwell, E., & Abuhakema, G. (2013). Error annotation of the Arabic Learner Corpus: A new error tagset. In I. Gurevych, C. Biemann, & T. Zesch (Eds.), Language Processing and Knowledge in the Web. Lecture Notes in Computer Science, vol 81051. Springer.

Altenberg, B., & Tapper, M. (1998). The use of adverbial connectors in advanced Swedish learners’ written English. In S. Granger (Ed.), Learner English on computer (pp. 80–93). Addison Wesley Longman.

André, V., Boulton, A., Ciekanski, M., & Cousinard, C. (2024). Learning to interact from conversational narratives: New perspectives for a data-driven approach integrating learner data. In S. Götz & S. Granger (Eds.), Learner Corpus Research for Pedagogical Purposes. Special issue of the International Journal of Learner Corpus Research, 10 (1), 67–106.

Axelsson, M. W., & Berglund, Y. (2002). The Uppsala Student English Corpus (USE): A multi-faceted resource for research and course development In L. Borin (Ed.), Parallel corpora, parallel worlds (pp. 79–90). Rodopi.

Ballier, N., & Martin, P. (2015). Speech annotation of learner corpora. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of Learner Corpus Research (pp. 107–134). Cambridge University Press.

Bestgen, Y. (2014). Inadequacy of the chi-squared test to examine vocabulary differences between corpora. Literary and Linguistic Computing, 29 (2), 164–170.

Bestgen, Y., & Granger, S. (2011). Categorizing spelling errors to assess L2 writing. International Journal of Continuing Engineering Education and Life-Long Learning, 21 (2/3), 235–252.

Biber, D., & Reppen, R. (1998). Comparing native and learner perspectives on English grammar: a study of complement clauses. In S. Granger (Ed.), Learner English on computer (pp. 145–158). Addison Wesley Longman.

Blázquez-Carratero, M. (2023). Building a pedagogic spellchecker for L2 learners of Spanish. ReCALL, 35 (3), 321–338.

Bley-Vroman, R. (1983). The comparative fallacy in interlanguage studies: The case of systematicity. Language Learning, 33 1, 1–17.

Borin, L., & Prütz, K. (2004). New wine in old skins? A corpus investigation of L1 syntactic transfer in learner language. In G. Aston, S. Bernardini, & D. Stewart (Eds.), Corpora and language learners (pp. 67–87). Benjamins.

Boulton, A., & Vyatkina, N. (2021). Thirty years of data-driven learning: Taking stock and charting new directions over time. Language Learning & Technology, 25 (3), 66–89.

Boyd, A., Hana, J., Nicolas, L., Meurers, D., Wisniewski, K., Abel, A., Schone, K., Stindlov, B., & Vettori, C. (2014). The MERLIN corpus: Learner language and the CEFR. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). Reykjavik, Iceland. [URL]

Brunni, S., Lehto, M.-M., Jantunen, J. H., & Airaksinen, V. (2015). How to annotate morphologically rich learner language. Principles, problems and solutions. Bergen Language and Linguistic Studies (BeLLS), 6 1, 133–152.

Caines, A., & Buttery, P. (2019). The effect of task and topic on opportunity of use in learner corpora. In V. Brezina & L. Flowerdew (Eds.), Learner Corpus Research: New perspectives and applications (pp. 5–27). Bloomsbury.

Caines, A., Nicholls, D., & Buttery, P. (2017). Annotating errors and disfluencies in transcriptions of speech. Technical Report 915. University of Cambridge Computer Laboratory. [URL]

Carter, R., & McCarthy, M. (2006). Cambridge grammar of English: A comprehensive guide. Cambridge University Press.

Castello, E., Ackerley, K., & Coccetta, F. (Eds.). (2016). Studies in Learner Corpus Linguistics: Research and applications for foreign language teaching and assessment. Peter Lang.

Chi, M. A., Wong, P. K., & Wong, C. M. (1994). Collocational problems among ESL learners: A corpus-based study. In L. Flowerdew L. & A. K. Tong (Eds.), Proceedings of the seminar on corpus linguistics and lexicology (pp. 157–165). Hong Kong: University of Science and Technology.

Chuang, F.-Y., & Nesi, H. (2006). An analysis of formal errors in a corpus of L2 English produced by Chinese students. Corpora, 1 (2), 251–271.

Cogo, A., & Dewey, M. (2012). Analysing English as a lingua franca: A corpus-driven investigation. Continuum.

Council of Europe. (2001). Common European Framework of Reference for Languages: learning, teaching and assessment. Cambridge University Press.

Cowan, R., Choi, H. E., & Kim, D. H. (2003). Four questions for error diagnosis and correction in CALL. CALICO Journal, 20 (3), 451–463.

Cowan, R., Choo, J., & Lee, G. S. (2014). ICALL for improving Korean L2 writers’ ability to edit grammatical errors. Language Learning and Technology, 18 (3), 193–207.

Crosthwaite, P. (2013). An error analysis of L2 English discourse reference through learner corpora analysis. Linguistic Research, 30 (2), 163–193.

(2019). Definite article bridging relations in L2: A learner corpus study. Corpus Linguistics and Linguistic Theory, 15 (2), 297–319.

Dagneaux, E., Denness, S., Granger, S., & Meunier, F. (1996). Error tagging manual. Version 1.1. Louvain-la-Neuve: Centre for English Corpus Linguistics. University of Louvain.

Dagneaux, E., Denness, S., & Granger, S. (1998). Computer-aided error analysis. System, 26 (2), 163–174.

Davies, A. (2003). The native speaker: Myth and reality. Multilingual Matters.

De Cock, S., Granger, S., Leech, G., & McEnery, T. (1998). An automated approach to the phrasicon of EFL learners. In S. Granger (Ed.), Learner English on computer (pp. 67–79). Addison Wesley Longman.

de Haan, P. (1997). An experiment in English learner data analysis. In J. Aarts, I. de Mönnink, & H. Wekker (Eds.), Studies in English language and teaching (pp. 215–229). Rodopi.

Díaz-Negrillo, A., & Fernández-Domínguez, J. (2006). Error tagging systems for learner corpora. Revista Española de Lingüística Aplicada, 19 1, 83–102.

Díez-Bedmar, M. B. (2018). Fine-tuning descriptors for CEFR B1 level: Insights from learner corpora. ELT Journal, 72 (2), 199–209.

Díez-Bedmar, M. B., & Pérez-Paredes, P. (2012). The types and effects of peer native speakers’ feedback on CMC. Language Learning & Technology, 16 (1), 62–90.

Domínguez, L., Tracy-Ventura, N., Arche, M. J., Mitchell, R., & Myles, R. (2013). The role of dynamic contrasts in the L2 acquisition of Spanish past tense morphology. Bilingualism: Language and Cognition, 16 (3), 20131, 558–577.

Doughty, C. J., & Long, M. H. (2003). The scope of inquiry and goals of SLA. In C. J. Doughty & M. H. Long (Eds.), The handbook of Second Language Acquisition (pp. 3–16). Blackwell.

Dressen-Hammouda, D. (2013). Politeness strategies in the job application letter: Implications of Intercultural Rhetoric for designing writing feedback. Asp, 64 1, 139–159.

Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? IRAL — International Review of Applied Linguistics in Language Teaching, 47(2), 157–177.

Ebeling, S. O., & Hasselgård, H. (2021). The functions of n-grams in bilingual and learner corpora: An integrated contrastive approach. In S. Granger (Ed.), Perspectives on the L2 Phrasicon: The view from learner corpora (pp. 25–49). Multilingual Matters.

Ferraresi, A. (2024). Learner corpora in the era of ChatGPT. Building a corpus of Italian EFL learners’ interactions with chatbots. Paper presented at TALC 2024, July 7–10. Manchester.

Field, Y., & Yip, L. (1992). A comparison of internal conjunctive cohesion in the English essay writing of Cantonese speakers and native speakers of English. RELC Journal, 23 1, 15–28.

Flowerdew, L. (1997). Interpersonal strategies: Investigating interlanguage corpora. RELC Journal, 28 (1), 72–88.

Fuchs, R., & Werner, V. (2018). Tense and aspect in Second Language Acquisition and learner corpus research. Introduction to the special issue. International Journal of Learner Corpus Research, 4 (2), 143–163.

Fuyuno, M., Komiya, R., & Saitoh, T. (2018). Multimodal analysis of public speaking performance by EFL learners: Applying deep learning to understanding how successful speakers use facial movement. The Asian Journal of Applied Linguistics, 5 (1), 117–129.

Gablasova, D., Brezina, V., McEnery, T., & Boyd, E. (2017). Epistemic stance in spoken L2 English: The effect of task and speaker style. Applied Linguistics, 38 (5), 613–637.

Gablasova, D., Brezina, V., & McEnery, T. (2019). The Trinity Lancaster Corpus: Development, description and application. International Journal of Learner Corpus Research, 5 (2), 126–158.

Gaillat, T., Simpkin, A., Ballier, N., Stearns, B., Sousa, A., et al. (2021). Predicting CEFR levels in learners of English: The use of microsystem criterial features in a machine learning. ReCALL, 34 (2), 130–146.

Geertzen, J., Alexopoulou, T., & Korhonen, A. (2014). Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge open language database (EFCamDat). Selected Proceedings of the 2012 Second Language Research Forum (pp. 240–254). Somerville, MA. [URL]

Gillard, P., & Gadsby, A. (1998). Using a learners’ corpus in compiling ELT. In S. Granger (Ed.), Learner English on computer (pp. 159–171). Addison Wesley Longman.

Gilquin, G. (2000). The integrated contrastive model: Spicing up your data. Languages in Contrast, 3 (1), 95–123.

(2007). To err is not all. What corpus and elicitation can reveal about the use of collocations by learners. Zeitschrift für Anglistik und Amerikanistik, 55 (3), 273–291.

(2021). Combining learner corpora and experimental methods. In N. Tracy-Ventura & M. Paquot (Eds.), The Routledge handbook of Second Language Acquisition and corpora (pp. 133–144). Routledge.

(2022). The Process Corpus of English in education: Going beyond the written text. Research in Corpus Linguistics, 10 (1), 31–44.

(2024). Lexical use in spoken New Englishes and learner Englishes: The effects of shared and distinct communicative constraints. In B. van Rooy & H. Kotze (Eds.), Constraints on language variation and change in complex multilingual contact settings (pp. 120–152). Benjamins.

(forthcoming). Second and foreign language learners: The effect of language exposure on the use of English phrasal verbs. International Journal of Bilingualism.

Gilquin, G., De Cock, S., & Granger, S. (2010). Louvain International Database of Spoken English Interlanguage. Handbook and CD-ROM. Presses universitaires de Louvain.

Gilquin, G., & Granger, S. (2021). The passive and the lexis-grammar interface: An inter-varietal perspective. In S. Granger (Ed.), Perspectives on the L2 phrasicon: The view from learner corpora (pp. 72–98). Multilingual Matters.

Gilquin, G., & Laporte, S. (2021). The use of online writing tools by learners of English: Evidence from a process corpus. International Journal of Lexicography, 34 (4), 472–492.

Gilquin, G., & Meriläinen, L. (2024). Constrained communication in EFL and ESL: The case of embedded inversion. English World-Wide, 45 (2), 196–223.

Glaznieks, A., Frey, J., Stopfner, M., Zanasi, L., & Nicolas, L. (2022). Leonide: A longitudinal trilingual corpus of young learners of Italian, German and English. International Journal of Learner Corpus Research, 8 (1), 97–120.

Götz, S. (2019). Filled pauses across proficiency levels, L1s and learning context variables. A multivariate exploration of the Trinity Lancaster Corpus Sample . International Journal of Learner Corpus Research, 5 (2), 159–180.

Götz, S., & Granger, S. (2024). Introduction: Learner corpus research for pedagogical purposes: An overview and some research perspectives. In S. Götz & S. Granger (Eds.), Learner corpus research for pedagogical purposes. Special issue of the International Journal of Learner Corpus Research, 10 (1), 1–38.

Götz, S., & Mukherjee, J. (2019). Investigating the effect of the study abroad variable on learner output: A pseudo-longitudinal study on spoken German learner English. In V. Brezina & L. Flowerdew (Eds.), Learner Corpus Research: New perspectives and applications (pp. 47–65). Bloomsbury.

Granger, S. (1993). The International Corpus of Learner English . In J. Aarts, P. de Haan, & N. Oostdijk (Eds.), English language corpora: Design, analysis and exploitation (pp. 57–69). Rodopi.

(1996). From CA to CIA and back: an integrated contrastive approach to computerized bilingual and learner corpora. In K. Aijmer, B. Altenberg, & M. Johansson (Eds.), Languages in contrast. Text-based cross-linguistic studies (pp. 37–51). Lund University Press.

(1997). Automated retrieval of passives from native and learner corpora: precision and recall. Journal of English Linguistics, 25 (4), 365–374.

(Ed.). (1998). Learner English on computer. Addison Wesley Longman.

(1999). Use of tenses by advanced EFL learners?: Evidence from an error-tagged computer corpus. In H. Hasselgård & S. Oksefjell (Eds.), Out of corpora. Studies in honour of Stig Johansson (pp. 191–202). Rodopi.

(2003). Error-tagged learner corpora and CALL: A promising synergy. CALICO, 20 (3), 465–480.

(2015). Contrastive interlanguage analysis: A reappraisal. International Journal of Learner Corpus Research, 1 (1), 7–24.

(2017). Learner corpora in foreign language education. In S. Thorne & S. May (Eds.), Language and technology. Encyclopedia of language and education. 3rd edition. (pp. 427–440). Springer.

(2021). Have Learner Corpus Research and Second Language Acquisition finally met? In B. Le Bruyn & M. Paquot (Eds.), Learner Corpus Research meets Second Language Acquisition (pp. 243–257). Cambridge University Press.

Granger, S., & Bestgen, Y. (2014). The use of collocations by intermediate vs. advanced non-native writers: A bigram-based study. IRAL, 52 (3), 229–252.

Granger, S., Cassart, A., Dagneaux, E., Husquet, C., Verhulst, N., & Watrin, P. (2002). Error tagging manual for L2 French. CECL Papers. Centre for English Corpus Linguistics: Université catholique de Louvain [URL]

Granger, S., & Lefer, M.-A. (2023). Learner translation corpora: Bridging the gap between learner corpus research and corpus-based translation studies. In S. Granger & M.-A. Lefer (Eds.) Learner translation corpora. Special issue of the International Journal of Learner Corpus Research, 9 (1), 1–28.

Granger, S., & Paquot, M. (2015). Electronic lexicography goes local: Design and structures of a needs-driven online academic writing aid. Lexicographica, 31 (1), 118–141.

(2022). The Louvain English for Academic Purposes Dictionary: User Manual. CECL Papers 5. Louvain-la-Neuve: Centre for English Corpus Linguistics/Université catholique de Louvain. [URL]

(forthcoming). Learner corpora of Language for Specific Purposes. In C. A. Chapelle (Ed.) Encyclopedia of Applied Linguistics. 2nd Edition. Wiley Blackwell.

Granger, S. & Rayson, P. (1998). Automatic lexical profiling of learner texts. In S. Granger (Ed.) Learner English on computer (pp. 119–131). Addison Wesley Longman.

Granger, S., Swallow, H., & Thewissen, J. (2022). The Louvain error tagging manual. Version 2.0. CECL Papers 4. Louvain-la-Neuve: Centre for English Corpus Linguistics/Université catholique de Louvain. [URL]

(2023). The UCLouvain Error Editor user guide — Version 2.0. CECL Papers 6. Louvain-la-Neuve: Centre for English Corpus Linguistics/Université catholique de Louvain. [URL]

Granger, S., & Tribble, C. (1998). Learner corpus data in the foreign language classroom: form-focused instruction and data-driven learning. In S. Granger (Ed.), Learner English on computer (pp. 199–209). Addison Wesley Longman.

Granger, S., & Tyson, S. (1996). Connector usage in the English essay writing of native and non-native EFL speakers of English. World Englishes, 15 (1), 17–27.

Gries, S. Th. (2006). Some proposals towards a more rigorous corpus linguistics. ZAA, 54 (2), 191–202.

(2008). Corpus-based methods in analyses of SLA data. In P. Robinson & N. C. Ellis (Eds.), Handbook of Cognitive Linguistics and Second Language Acquisition (pp. 406–431). Routledge.

(2015). Statistics for learner corpus research. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of Learner Corpus Research (pp. 159–181). Cambridge University Press.

(2022). MuPDAR for corpus-based learner and variety studies: Two (more) suggestions for improvement. In S. Flach & M. Hilpert (Eds.), Broadening the spectrum of corpus linguistics: New approaches to variability and change (pp. 257–283). Benjamins.

Gyllstad, H., & Snoder, P. (2021). Exploring learner corpus data for language testing and assessment purposes. In S. Granger (Ed.) Perspectives on the L2 phrasicon: The view from learner corpora (pp. 49–71). Multilingual Matters.

Han, J., Yoo, H., Myung, J., Kim, M., Lee, T. Y., Ahn, S.-Y., & Oh, A. (2024). RECIPE4U: Student-ChatGPT interaction dataset in EFL writing education. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (pp. 13666–13676). Torino, Italia. [URL]

Higgins, D., Ramineni, C., & Zechner, K. (2015). Learner corpora and automated scoring. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of Learner Corpus Research (pp. 587–604). Cambridge University Press.

Horváth, J. (2001). Advanced writing in English as a foreign language: A corpus-based study of processes and products. Lingua Franca Csoport.

Housen, A. (2002). A corpus-based study of the L2 acquisition of the English verb system. In S. Granger, S., J. Hung J., & S. Petch-Tyson (Eds.) Computer Learner Corpora, Second Language Acquisition and Foreign Language Learning (pp. 77–116). Benjamins.

Howarth, P. A. (1996). Phraseology in English academic writing: Some implications for language learning and dictionary making. Lexicographica Series Maior 75. Max Niemeyer.

Hsieh, W.-M., & Liou, H.-C. (2008). A case study of corpus-informed online academic writing for EFL graduate students. CALICO Journal, 26 (1), 28–47.

Huang, Y., Murakami, A., Theodora Alexopoulou, T., & Korhoneni, A. (2018). Dependency parsing of learner English. International Journal of Corpus Linguistics, 23 (1), 28–54.

Hyland, K., & Milton, J. (1997). Qualification and certainty in L1 and L2. Journal of Second Language Writing, 6 (2), 183–205.

Ionin, T., & Díez-Bedmar, M. B. (2021). Article use in Russian and Spanish learner writing at CEFR B1 and B2 Levels: Effects of proficiency, native language, and specificity. In B. Le Bruyn & M. Paquot (Eds.), Learner Corpus Research meets Second Language Acquisition (pp. 243–257). Cambridge University Press.

Ishikawa, S. (2023). The ICNALE Guide. An Introduction to a learner corpus study on Asian learners’ L2 English. Routledge.

Ivaska, I., Ferraresi, A., & Bernardini, S. (2022). Syntactic properties of constrained English: A corpus-driven approach. In S. Granger & M.-A. Lefer (Eds.), Extending the scope of corpus-based translation studies (pp. 133–157). Bloomsbury.

Ivaska, I., Bernardini, S., & Ferraresi, A. (2024). The complex case of constrained communication: A corpus-driven, multilingual and multi-register search for the common ground between non-native and translated language. In B. van Rooy & H. Kotze (Eds.), Constraints on language variation and change in complex multilingual contact settings (pp.191–222). Benjamins.

Jadoulle, P. (2024). Investigating noviceness and non-nativeness in academic writing: A cross-linguistic approach to stance. Unpublished doctoral dissertation. University of Louvain: Louvain-la-Neuve.

Jarvis, S., & Pavlenko, A. (2008). Crosslinguistic influence in language and cognition. Routledge.

Jung, Y., Gablasova, D., Brezina, V., & Schmück, H. (2024). Developing a coding scheme for annotating opinion statements in L2 interactive spoken English with application for language teaching and assessment. Research in Corpus Linguistics, 12 (2), 146–173.

Källkvist, M. (1995). Lexical errors among verbs? A pilot study of the vocabulary of advanced Swedish learners of English. Working Papers in English and Applied Linguistics (pp. 103–115). Lund University [URL]

(1999). Form-class and task-type effects in learner English: A study of advanced Swedish learners. Lund University Press.

Kaszubski, P. (1997). Polish student writers — Can corpora help them? In B. Lewandowska-Tomaszczyk & P. J. Melia (Eds.) PALC’97 — Practical applications in language corpora (pp. 133–158). Lódź University Press.

(1998). Enhancing a writing textbook: A national perspective. In S. Granger (Ed.) Learner English on computer (pp. 172–185). Addison Wesley Longman.

Kawecki, R. (2013). A beginner French learner corpus. In S. Granger, G. Gilquin, & F. Meunier (Eds.) Twenty years of Learner Corpus Research: Looking back, moving ahead (pp. 247–261). Presses universitaires de Louvain.

Kyle, K. (Ed.) (2021). Natural language processing for learner corpus research. Special issue of the International Journal of Learner Corpus Research, 7 (1).

Kyle, K., & Eguchi, M. (2023). Assessing spoken lexical and lexicogrammatical proficiency using features of word, bigram, and dependency bigram use. The Modern Language Journal, 107 (2), 531–564.

Lado, R. (1957). Linguistics across cultures: Applied linguistics for language teachers. University of Michigan Press.

Larsen-Freeman, D. (2014). Another step to be taken — Rethinking the end point of the interlanguage continuum. In Z. Han & E. Tarone (Eds.) Interlanguage. Forty years later (pp. 203–220). Benjamins.

Larsson, T., Egbert, J., & Biber, D. (2022a). On the status of statistical reporting versus linguistic description in corpus linguistics: A ten-year perspective. Corpora, 17 (1), 137–157.

Larsson, T., Reppen, R., & Dixon, T. (2022b). A phraseological study of highlighting strategies in novice and expert writing. Journal of English for Academic Purposes, 60 1, 101179.

Laufer, B., & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16 1, 307–322.

Leacock, C., Chodorow, M., & Tetreault, J. (2015). Automatic grammar — and spell-checking for language learners. In S. Granger, G. Gilquin, & F. Meunier (Eds.) The Cambridge handbook of Learner Corpus Research (pp. 567–586). Cambridge University Press.

Lee, L.-H., Chang, L.-P., & Tseng, Y.-H. (2016). Developing learner corpus annotation for Chinese grammatical errors. 2016 International Conference on Asian Language Processing (IALP), Tainan, Taiwan (pp. 254–257). [URL].

Leech, G. (1998). Preface: Learner corpora: what they are and what can be done with them. In: S. Granger (Ed.) Learner English on computer (pp. xiv–xx). Pearson.

Leńko-Szymańska, A., & Biel, L. (2023). Terminological collocations in trainee and professional legal translations. A learner-corpus study of L2 company law translations. International Journal of Learner Corpus Research, 9 (1), 29–59.

Leńko-Szymańska, A., & Götz, S. (Eds.) (2022). Complexity, accuracy and fluency in Learner Corpus Research. Benjamins.

Lessard, G. (1999). Review of Learner English on computer (Granger Ed., 1998). Computational Linguistics, 25 (2), 302–303.

Li, Q., Tarp, S., Nomdedeu-Rull, A. (2024). The necessary symbiosis: How ChatGPT co-authored a new type of learner’s grammar to be displayed in a digital writing assistant. [Manuscript submitted for publication].

Lim, J., Mark, G., Pérez-Paredes, P., & O’Keeffe, A. (2024). Exploring part of speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective. Corpora, 19 (1), 31–59.

Lorenz, G. (1998). Overstatement in advanced learners’ writing: Stylistic aspects of adjective intensification. In S. Granger (Ed.) Learner English on computer (pp. 53–66). Addison Wesley Longman.

Lozano, C., & Díaz-Negrillo, A. (2019). Using learner corpus methods in L2 acquisition research. The morpheme order studies revisited with Interlanguage Annotation. Revísta Española de Lingüistica Aplicada, 32 1, 82–124.

Lüdeling, A., & Hirschmann, H. (2015). Error annotation systems. In S. Granger, G. Gilquin, & F. Meunier (Eds.) The Cambridge handbook of Learner Corpus Research (pp. 135–157). Cambridge University Press.

Lüdeling, A., M. Walter, E. Kroymann, & P. Adolphs. (2005). Multi-level error annotation in learner corpora. Proceedings from the Corpus Linguistics Conference Series, Vol. 1, no. 1. [URL]

Ma, Q., Crosthwaite, P., Sun, D., & Zou, D. (2024). Exploring ChatGPT literacy in language education: A global perspective and comprehensive approach. Computers and education:Artificial intelligence.

Marchand, T., & Akutsu, S. (2015). First steps in assigning proficiency to texts in a learner corpus of computer-mediated communication. In M. Callies & S. Götz (Eds.) Learner corpora in language testing and assessment (pp. 85–112). Benjamins.

Marti, L., Yilmaz, S., & Bayyurt, Y. (2019). Reporting research in applied linguistics: The role of nativeness and expertise. Journal of English for Academic Purposes, 40 1, 98–114.

Meunier, F. (2016). Introduction to the LONGDALE Project. In E. Castello, K. Ackerley, & F. Coccetta (Eds.) Studies in learner corpus linguistics: Research and applications for foreign language teaching and assessment (pp. 123–126). Peter Lang.

Milton, J. (1998). Exploiting L1 and interlanguage corpora in the design of an electronic language learning and production environment. In S. Granger (Ed.) Learner English on computer (pp. 186–198). Addison Wesley Longman.

Milton, J., & Chowdhury, N. (1994). Tagging the interlanguage of Chinese learners of English. In L. Flowerdew & A. K. K. Tong (Eds.) Entering text (pp. 127–143). The Hong Kong University of Science and Technology.

Milton, J., & Tsang, E. S. C. (1993). A corpus-based study of logical connectors in EFL students’ writing: Directions for future research. In R. Pemberton & E. S. C. Tsang (Eds.) Studies in lexis (pp. 215–246). The Hong Kong University of Science and Technology.

Möller, V. (2017). A statistical analysis of learner corpus data, experimental data and individual differences: Monofactorial vs. multifactorial approaches. In P. de Haan, R. de Vries, & S. van Vuuren (Eds.) Language, learners and levels: Progression and variation (pp. 409–439). Benjamins.

Murakami, A. (2013). Cross-linguistic influence on the accuracy order of L2 English grammatical morphemes. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Twenty years of learner corpus research. Looking back, moving ahead (pp. 325–334). Presses universitaires de Louvain.

Murakami, A., & Alexopoulou, T. (2016). L1 influence on the acquisition order of English grammatical morphemes: A learner corpus study. Studies in Second Language Acquisition, 38 (3), 365–401.

Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2 1, 100050.

Myles, F. (2005). Interlanguage corpora and second language acquisition research. Second Language Research, 21 (4), 373–391.

Neumanová, Z. (2023). Investigating L2 English preposition use by Czech university students: A learner corpus study. Ostrava Journal of English Philology, 15 (1), 93–119.

Nicholas, A., Blake, J., Mozgovoy, M., & Perkins, J. (2023). Investigating pragmatic failure in L2 English email writing among Japanese university EFL learners. A learner corpus approach. Register Studies, 5 (1), 23–51.

O’Donnell, M. (2008). The UAM Corpus Tool: Software for corpus annotation and exploration. In Proceedings of the XXVI Congreso de AESLA (pp. 3–5), Almeria, Spain.

Pan, Z. (2024). The use of semi-automatic annotation in speech acts performed by learners of English. World Journal of English Language, 14 (6), 1–12.

Paquot, M. (2024). Learner corpus research: a critical appraisal and roadmap for contributing (more) to SLA research agendas. Corpus Linguistics and Linguistic Theory.

Paquot, M., & Plonsky, L. (2017). Quantitative research methods and study quality in learner corpus research. International Journal of Learner Corpus Research, 3 (1): 61–94.

Petch-Tyson, S. (1998). Writer/reader visibility in EFL written discourse. In S. Granger (Ed.) Learner English on computer (pp. 107–118). Addison Wesley Longman.

Picoral, A., Staples, S., & Reppen, R. (2021). Automated annotation of learner English: An evaluation of software tools. International Journal of Learner Corpus Research, 7 (1), 17–52.

Pilar Valverde Ibañez, M., & Ohtani, A. (2014). Annotating article errors in Spanish learner texts: Design and evaluation of an annotation scheme. Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation (pp. 234–243). Phuket, Thailand. [URL]

Rakhilina, E., Vyrenkova, A., Mustakimova, E., Alina Ladygina, A., & Smirnov, I. (2016). Building a learner corpus for Russian. In Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition. Umeå, Sweden. [URL]

Rautionaho, P., & Deshors, S. C. (2018). Progressive or not progressive? Modeling constructional choices in EFL and ESL. International Journal of Learner Corpus Research, 4 (2), 225–252.

Rebuschat, P., Meurers, D., & McEnery, T. (2017). Language learning research at the intersection of experimental, computational, and corpus-based approaches. Language Learning, 67 : S1, 6–13.

Ringbom, H. (1998). Vocabulary frequencies in advanced learner English: A cross-linguistic approach. In S. Granger (Ed.) Learner English on computer (pp. 41–52). Addison Wesley Longman.

Römer, U. (2009). English in academia: Does nativeness matter? Anglistik: International Journal of English Studies, 20 (2), 89–100.

Rosen, A. (2016). Building and using corpora of non-native Czech. ITAT 2016 Proceedings, CEUR Workshop Proceedings, 1649 1, 80–87.

Rosen, A., Hana, J., Štindlová, B., & Feldman, A. (2014). Evaluating and automating the annotation of a learner corpus. Language Resources & Evaluation, 48 1, 65–92.

Rundell, M. (2009). The future has arrived: A new era in electronic dictionaries. MED Magazine, 54 1. [URL]

Rundell, M., & S. Granger. (2007). From corpora to confidence. English Teaching Professional, 501: 15–18.

Sarte, K. M. & Gnevsheva, K. (2022). Noun phrasal complexity in ESL written essays under a constructed-response task: Examining proficiency and topic effects. Assessing Writing, 51 1, 100595.

Shaw, S. (1997). The use of language corpora in the compilation of the Longman Dictionary of Contemporary English (third edition). In B. Lewandowska-Tomaszczyk & P. J. Melia (Eds.) PALC’97 — Practical applications in language corpora (pp. 269–275). Lódź University Press.

Skehan, P. (1998). A cognitive approach to language learning. Oxford University Press.

Štindlová, B., Škodová, S., Rosen, A., & Hana, J. (2013). A learner corpus of Czech: Current state and future directions. In S. Granger, G. Gilquin, & F. Meunier (Eds.). Twenty years of Learner Corpus Research. Looking back, moving ahead (pp. 435–446). Presses universitaires de Louvain.

Tan, M. (2005). Authentic language or language errors? Lessons from a learner corpus. ELT Journal, 59 (2), 126–134.

Tao, Y., Agrawal, A., Dombi, J., Sydorenko, T., & Lee, J. I. (2024). ChatGPT Role-play dataset: Analysis of user motives and model naturalness. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (pp. 3133–3145). Torino, Italia. [URL]

Tarp, S. (2023). Eppur si muove: Lexicography is becoming intelligent! Lexikos, 33(2), 107–131.

Tarp, S., Fisker, K., & Sepstrup, P. (2017). L2 writing assistants and context-aware dictionaries: New challenges to lexicography. Lexikos, 271, 494–521.

Tarp, S., & Nomdedeu-Rull, A. (2024). Who has the last word? Lessons from using ChatGPT to develop an AI-based Spanish writing assistant. Círculo de Lingüística Aplicada a la Comunicación, 971, 309–321.

Tenfjord, K., Meurer, P., & Hofland, K. (2006). The ASK Corpus — a language learner corpus of Norwegian as a second language. In N. Calzolari, K. Choukri, A. Gangemi, B. Maegaard, J. Mariani, J. Odijk, & D. Tapias (Eds.) Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC06) (pp. 1821–1824).

Thewissen, J. (2015). Accuracy across proficiency levels: A learner corpus approach. Presses universitaires de Louvain.

Tono, Y. (1996). Using learner corpora for L2 lexicography? Information of collocational errors for EFL learners. Lexikos, 6 1, 116–132.

(2000). A computer learner corpus-based analysis of the acquisition order of English grammatical morphemes. In L. Burnard, & T. McEnery (Eds.) Rethinking language pedagogy from a corpus perspective (pp. 123–132). Peter Lang.

Tono, Y., Kaneko, T., Isahara, H., Saiga, T., Izumi, E., Narita, M., & Kaneko, E. (2001). The Standard Speaking Test (SST) Corpus: A 1 million-word spoken corpus of Japanese learners of English and its implication for L2 lexicography. Language Facts and Perspectives, 11(2), 7–17.

Tono, Y., Fukuda, K., Takebayashi, K., & Kawamoto, N. (2024). Using ChatGPT and CEFR profile information to create learner corpora with error codings and comparable texts with different CEFR levels. Paper presented at TALC 2024, July 7–10. Manchester.

Tracy-Ventura, N., & Huensch, A. (2018). The potential of publicly shared longitudinal learner corpora in SLA research. In A. Gudmestad & A. Edmonds (Eds.) Critical reflections on data in Second Language Acquisition (pp. 149–170). Benjamins.

Tracy-Ventura, N., Mitchell, R., & McManus, K. (2016). The LANGSNAP longitudinal learner corpus: Design and use. In M. Alonso Ramos (Ed.) Spanish Learner Corpus Research: Current trends and future perspectives (pp. 117–142). Benjamins.

Tracy-Ventura, N., & Paquot, M. (Eds.). (2021). The Routledge handbook of Second Language Acquisition and corpora. Routledge.

Turton, N. D. & Heaton, J. B. (1996). Longman dictionary of common errors. New Edition. Addison Wesley Longman.

Vajjala, S. (2018). Automated assessment of non-native learner essays: Investigating the role of linguistic features. International Journal of Artificial Intelligence in Education, 28 1, 79–105.

Vanderbauwhede, G. (2012). The Integrated Contrastive Model evaluated: The French and Dutch demonstrative determiner in L1 and L2. International Journal of Applied Linguistics, 22(3), 392–413.

Vandeweerd, N., Housen, A., & Paquot, M. (2023). Comparing the longitudinal development of phraseological complexity across oral and written tasks. Studies in Second Language Acquisition, 45(4), 787–811.

Vinogradova, O. (2016). The role and applications of expert error annotation in a corpus of English learner texts. Proceedings of “Dialog 2016”, 151, 740–751. [URL]

(2019). To automated generation of test questions on the basis of error annotations in EFL essays. A time-saving tool? In S. Götz & J. Mukherjee (Eds.) Learner corpora and language teaching (pp. 29–48). Benjamins.

Virtanen, T. (1997). The progressive in NNS and NS student compositions: Evidence from the International Corpus of Learner English . In M. Ljung (Ed.) Corpus-based studies in English (pp. 299–309). Rodopi.

(1998). Direct questions in argumentative student writing. In S. Granger (Ed.) Learner English on computer (pp. 94–106). Addison Wesley Longman.

Vyatkina, N. (2013). Analyzing part-of-speech variability in a longitudinal learner corpus and a pedagogic corpus. In S. Granger, G. Gilquin, & F. Meunier (Eds.) Twenty years of Learner Corpus Research: Looking back, moving ahead (pp. 479–491). Presses universitaires de Louvain.

Wang, Q., & Yuan, Z. (2024). Assessing the efficacy of grammar error correction: A human evaluation approach in the Japanese context. arXiv:2402.18101

Wang, W., & Zhang, J. (2023). Factors predicting human performance in error annotation for non-native speech corpus. Speech Communication, 149 1, 38–46.

Wang, X., Bruno, J., Molloy, H., Evanini, K., & Zechner, K. (2017). Discourse annotation of non-native spontaneous spoken responses using the rhetorical structure theory framework. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 263–268). Vancouver, Canada. Association for Computational Linguistics.

Weisser, M. (2021). Profiling learners through pragmatically and error annotated corpora. In P. Pérez-Paredes & G. Mark (Eds.). Beyond concordance lines: Corpora in language education (pp. 121–148). Benjamins.

Xia, D., Sulzer, M. A., & Pae, H. K. (2023). Phrase-frames in business emails: A contrast between learners of business English and working professionals. Text & Talk, 44 (5), 693–714.