Data-driven learning (DDL) typically involves language learners consulting corpus data, either directly or via prepared materials, to answer questions about language. The approach has been mooted since the beginning of the modern era of corpus linguistics and has come to be associated with work by Tim Johns who coined the term in print in 1990. Since then, hundreds of studies have attempted to evaluate some aspect of DDL, giving rise to several reviews and syntheses. This paper introduces DDL and discusses the syntheses to date, before analysing a rigorous collection of 351 studies published up to and including 2018. While previous syntheses have evaluated the field, the objective here is to provide an overview of how researchers see DDL across the board, to identify more clearly what DDL actually looks like today, how it has evolved from its early beginnings in the 1980s, and to suggest avenues for future research in underexplored areas.
Abu Alshaar, A., & Abuseileek, A. F. (2013). Using concordancing and word processing to improve EFL graduate students’ written English. JALT CALL Journal, 9(1), 59–77.
Al-Gamal, A. A. M., & Mohammed Ali, E. A. M. (2019). Corpus-based method in language learning and teaching. International Journal of Research and Analytical Reviews, 6(2), 473–476.
Allan, R. (2009). Can a graded reader corpus provide ‘authentic’ input?ELT Journal, 63, 23–32.
An, X.-H., & Xu, M.-Y. (2013). An empirical research on DDL in L2 writing. US-China Education Review A, 3(9), 693–701.
Anthony, L. (2019). AntConc [version 3.5.8m]. Tokyo: Waseda University. [URL]
Bax, S. (2003). CALL: Past, present and future. System, 311, 13–28.
Bernardini, S. (2000). Systematising serendipity: Proposals for concordancing large corpora with language learners. In L. Burnard & T. McEnery (Eds.), Rethinking language pedagogy from a corpus perspective (pp. 225–234). Peter Lang.
Boulton, A. (2008). But where’s the proof? The need for empirical evidence for data-driven learning. In M. Edwardes (Ed.), Technology, ideology and practice in applied linguistics (pp. 13–16). Scitsiugnil Press. Retrieved from [URL]
Boulton, A. (2009). Data-driven learning: Reasonable fears and rational reassurance. Indian Journal of Applied Linguistics, 35(1), 81–106.
Boulton, A. (2010). Learning outcomes from corpus consultation. In M. Moreno Jaén, F. Serrano Valverde, & M. Calzada Pérez (Eds.), Exploring new paths in language pedagogy: Lexis and corpus-based language teaching (pp. 129–144). Equinox.
Boulton, A. (2011). Data-driven learning: The perpetual enigma. In S. Goźdź-Roszkowski (Ed.), Explorations across languages and corpora (pp. 563–580). Peter Lang.
Boulton, A. (2012). Corpus consultation for ESP: A review of empirical research. In A. Boulton, S. Carter-Thomas, & E. Rowley-Jolivet (Eds.), Corpus-informed research and learning in ESP: Issues and applications (pp. 261–291). John Benjamins.
Boulton, A. (2015). Applying data-driven learning to the web. In A. Leńko-Szymańska & A. Boulton (Eds.), Multiple affordances of language corpora for data-driven learning (pp. 267–295). John Benjamins.
Boulton, A. (2017). Corpora in language teaching and learning. Language Teaching, 50(4), 483–506.
Boulton, A., & Cobb, T. (2017). Corpus use in language learning: A meta-analysis. Language Learning, 67(2), 348–393.
Boulton, A., & Vyatkina, N. (2021). Thirty years of data-driven learning: Taking stock and charting new directions over time. Language Learning & Technology, 25(3).
Burston, J., & Arispe, K. (2018). Looking for a needle in a haystack: CALL and advanced language proficiency. Calico Journal, 35(1), 77–102.
Chambers, A. (2007). Popularising corpus consultation by language learners and teachers. In E. Hidalgo, L. Quereda, & J. Santana (Eds.), Corpora in the foreign language classroom (pp. 3–16). Rodopi.
Chambers, A. (2019). Towards the corpus revolution? Bridging the research–practice gap. Language Teaching, 52(4), 460–475.
Chambers, A., & Bax, S. (2006). Making CALL work: Towards normalisation. System, 344, 465–479.
Cobb, T., & Boulton, A. (2015). Classroom applications of corpus analysis. In D. Biber & R. Reppen (Eds.), Cambridge handbook of English corpus linguistics (pp. 478–497). Cambridge University Press.
Cresswell, A. (2007). Getting to ‘know’ connectors? Evaluating data-driven learning in a writing skills course. In E. Hidalgo, L. Quereda, & J. Santana (Eds.), Corpora in the foreign language classroom (pp. 267–287). Rodopi.
Crosthwaite, P., & Stell, A. (2019). It helps me get ideas on how to use my words: Primary school students’ initial reactions to corpus use in a private tutoring setting. In P. Crosthwaite (Ed.), Data-driven learning for the next generation: Corpora and DDL for pre-tertiary learners (pp. 150–170). Routledge.
Gilquin, G., & S. Granger. (2010). How can data-driven learning be used in language teaching? In A. O’Keeffe & M. McCarthy (Eds.), Routledge handbook of corpus linguistics (pp. 359–370). Routledge.
Gillespie, J. (2020). CALL research: Where are we now?ReCALL, 32(2), 127–144.
Han, Z. (2015). Striving for complementarity between narrative and meta-analytic reviews. Applied Linguistics, 36(3), 409–415.
Hanks, P. (2013). Lexical analysis: Norms and exploitations. The MIT Press.
Higgins, J., & Johns, T. (1984). Computers in language learning. Collins.
Hoey, M. (2005). Lexical priming: A new theory of words and language. Routledge.
Johns, T. (1988). Whence and whither classroom concordancing? In T. Bongaerts, P. de Haan, S. Lobbe, & H. Wekker (Eds.), Computer applications in language learning (pp. 9–27). Foris.
Johns, T. (1990). From printout to handout: Grammar and vocabulary teaching in the context of data-driven learning. CALL Austria, 10, 14–34.
Johns, T. (1991). Should you be persuaded: Two samples of data-driven learning materials. In T. Johns & P. King (Eds.), Classroom concordancing. English Language Research Journal, 4, 1–16.
Johns, T. (1993). Data-driven learning: An update. TELL&CALL, 2, 4–10.
Johns, T., & King, P. (1991). Editors’ preface. In T. Johns & P. King (Eds.), Classroom concordancing. English Language Research Journal, 4, iii–iv.
Johns, T. (1997). Contexts: the background, development and trialling of a concordance-based CALL program. In A. Wichmann, S. Fligelstone, T. McEnery & G. Knowles (Eds.), Teaching and language corpora (pp. 100–115). Addison Wesley Longman.
Johns, T., Lee, H., & Wang, L. (2008). Integrating corpus-based CALL programs and teaching English through children’s literature. Computer Assisted Language Learning, 21(5), 483–506.
Lee, H., Warschauer, M., & Lee, J. H. (2019). The effects of corpus use on second language vocabulary learning: A multilevel meta-analysis. Applied Linguistics, 40(5), 721–753.
Luo, Q. (2016). The effects of data-driven learning activities on EFL learners’ writing development. SpringerPlus, 5, n.p.
Ma, B. (1993). Small-corpora concordancing in ESL teaching and learning. Hong Kong Papers in Linguistics and Language Teaching, 16, 11–30.
McEnery, T., & Wilson, A. (1997). Teaching and language corpora (TALC). ReCALL, 9(1), 5–14.
McKay, S. (1980). Teaching the syntactic, semantic and pragmatic dimensions of verbs. TESOL Quarterly, 14(1), 17–26.
Mizumoto, A., & Chujo, K. (2015). A meta-analysis of data-driven learning approach in the Japanese EFL classroom. English Corpus Studies, 22, 1–18.
Pérez-Paredes, P. (2019). A systematic review of the uses and spread of corpora and data-driven learning in CALL research during 2011–2015. Computer Assisted Language Learning.
Plonsky, L., & Oswald, F. L. (2014). How big is ‘big’? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912.
Plonsky, L., & Ziegler, N. (2016). The CALL–SLA interface: Insights from a second-order synthesis. Language Learning & Technology, 20(2), 17–37. 10125/44459
Römer, U. (2011). Corpus research applications in second language teaching. Annual Review of Applied Linguistics, 31, 205–225.
Shintani, N., Li, S., & Ellis, R. (2013). Comprehension-based versus production-based grammar instruction: A meta-analysis of comparative studies. Language Learning, 63(2), 296–329.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford University Press.
Taylor, J. (2012). The mental corpus: How language is represented in the mind. Oxford University Press.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge University Press.
Cited by (4)
Cited by four other publications
Şahin Kızıl, Aysel
2025. Complexity, Accuracy, and Fluency and Data-Driven Learning. In The Palgrave Encyclopedia of Computer-Assisted Language Learning, ► pp. 1 ff.
Römer, Ute
2024. Usage‐Based Approaches to Second Language Acquisition Vis‐à‐Vis Data‐Driven Learning. TESOL Quarterly 58:3 ► pp. 1235 ff.
2022. Corpus literacy empowerment: taking stock of research to look forward for practice. Journal of China Computer-Assisted Language Learning 2:1 ► pp. 126 ff.
This list is based on CrossRef data as of 26 december 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.