This paper critically discusses how corpus linguistics in general, but learner corpus research in particular, has been dealing with
all sorts of frequency data in general, but over- and underuse frequencies in particular. I demonstrate on the basis of learner
corpus data the pitfalls of using aggregate data and lacking statistical control that much work is unfortunately characterized by.
In fact, I will demonstrate that monofactorial methods have very little to offer at all to research on observational data. While
this paper is admittedly very didactic and methodological, I think the discussion of the empirical data offered here – a
reanalysis of previously published work – shows how misleading many studies potentially and provides far-reaching implications for
much of corpus linguistics and learner corpus research. Ideally/maximally, this paper together with Paquot & Plonsky (2017, Intntl. J. of Learner Corpus Research) would lead to a complete
revision of how learner corpus linguists use quantitative methods and study over-/underuse; minimally, this paper would stimulate
a much-needed discussion of currently lacking methodological sophistication.
Aijmer, K. (2002). Modality in advanced Swedish learners’ written interlanguage. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition, and foreign language teaching (pp. 55–76). Amsterdam: John Benjamins.
Altenberg, B. (2002). Using bilingual corpus evidence in learner corpus research. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition, and foreign language teaching (pp. 37–54). Amsterdam: John Benjamins.
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed). New York, NY: Springer.
Connor, U., Precht, K., & Upton, T. (2005). Business English: Learner data from Belgium, Finland, and the U.S. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition, and foreign language teaching (pp. 175–194). Amsterdam: John Benjamins.
Gilquin, G., & Lefer, M. -A. (2017). Exploring word-formation in Learner Corpus Research: A case study on English negative affixes. Paper presented at the Learner Corpus Research conference 2017, Bolzano, Italy.
Gries, S. Th. (2006). Exploring variability within and between corpora: some methodological considerations. Corpora, 1(2), 109–151.
Gries, S. Th. (2013). Statistics for linguistics with R (2nd rev. and ext. ed). Berlin: De Gruyter Mouton.
Gries, S. Th. (2015). The most underused statistical method in corpus linguistics: Multi-level (and mixed-effects) models. Corpora, 10(1), 95–125.
Gries, S. Th., & Adelman, A. S. (2014). Subject realization in Japanese conversation by native and non-native speakers: Exemplifying a new paradigm for
learner corpus research. In J. Romero-Trillo (Ed.), Yearbook of corpus linguistics and pragmatics 2014: New empirical and theoretical paradigms (pp. 35–54). Cham: Springer.
Gries, S. Th., & Deshors, S. C. (2014). Using regressions to explore deviations between corpus data and a standard/target: Two suggestions. Corpora, 9(1), 109–136.
Gries, S. Th. (to appear). Priming of syntactic alternations by learners of English: An analysis of sentence-completion and collostructional
results.
Hasselgård, H., & Johansson, S. (2011). Learner corpora and contrastive interlanguage analysis. In F. Meunier, S. De Cock, G. Gilquin, & M. Paquot (Eds.), A taste for corpora: In honour of Sylviane Granger (pp. 33–61). Amsterdam: John Benjamins.
Hawkins, J. A. (1994). A performance theory of order and constituency. Cambridge: Cambridge University Press.
Hyland, K., & Milton, J. (1997). Qualification and certainty in L1 and L2 students’ writing. Journal of Second Language Writing, 6(2), 183–205.
Jaeger, T. F. (2010). Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology, 61(1), 23–62.
Labov, W. (1982). The social stratification of English in New York City. Washington, DC: Center for Applied Linguistics.
Laufer, B., & Waldman, T. (2011). Verb-noun collocations in second language writing: A corpus analysis of learners’ English. Language Learning, 61(2), 647–672.
Neff van Aertselaer, J. & Bunce, C. (2012). The use of small corpora for tracing the development of academic literacies. In F. Meunier, S. De Cock, G. Gilquin, & M. Paquot (Eds.), A taste for corpora: In honour of Sylviane Granger (pp. 63–83). Amsterdam: John Benjamins.
Wulff, S. (2016). A friendly conspiracy of input, L1, and processing demands: that-variation in German and Spanish
learner language. In A. Tyler, L. Ortega, H. I. Park, & M. Uno (Eds.), The usage-based study of language learning and multilingualism (pp. 115–136). Washington, DC: Georgetown University Press.
Wulff, S., Lester, N. A. & Martinez-Garcia, M. M. (2014). That-variation in German and Spanish L2 English. Language and Cognition, 6(2), 271–299.
Cited by (34)
Cited by 34 other publications
Bernaisch, Tobias, Aishath Suad & Aminath Saeed
2024. Particle verbs versus simplex verbs in Maldivian English. World Englishes
Botha, Werner & Tobias Bernaisch
2024. Social network effects on particle variation among Singapore students. World Englishes
Casal, J. Elliott, Genggeng Zhang, Ghadi Matouq & Hana Alqabba
2024. Against level-3-only analyses in corpus linguistics. ICAME Journal 48:1 ► pp. 23 ff.
Leuckert, Sven, Claudia Lange, Tobias Bernaisch & Asya Yurchenko
2024. Indian Englishes in the Twenty-First Century,
Paquot, Magali
2024. Learner corpus research: a critical appraisal and roadmap for contributing (more) to SLA research agendas. Corpus Linguistics and Linguistic Theory 20:3 ► pp. 567 ff.
Gonzales, Wilkinson Daniel Wong, Mie Hiramoto, Jakob R. E. Leimgruber & Jun Jie Lim
2023. The Corpus of Singapore English Messages (CoSEM). World Englishes 42:2 ► pp. 371 ff.
Pyykönen, Maria
2023. Epistemic stance in written L2 English: The role of task type, L2 proficiency, and authorial style. Applied Corpus Linguistics 3:1 ► pp. 100040 ff.
Bernaisch, Tobias, Stefan Th. Gries & Benedikt Heller
2022. Theoretical models and statistical modelling of linguistic epicentres. World Englishes 41:3 ► pp. 333 ff.
Chen, Jianhua & Xiaopeng Zhang
2022. L2 development of phraseological knowledge via a xu-argument based continuation task: A latent curve modeling approach. System 106 ► pp. 102767 ff.
Paquot, Magali, Dana Gablasova, Vaclav Brezina & Hubert Naets
2022. Gesprochene Lernerkorpora des Deutschen: Eine Bestandsaufnahme. Zeitschrift für germanistische Linguistik 50:1 ► pp. 1 ff.
König, Alexander, Jennifer-Carmen Frey & Egon W. Stemle
2021. Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora. Information 12:5 ► pp. 199 ff.
Sönning, Lukas & Valentin Werner
2021. The replication crisis, scientific revolutions, and linguistics. Linguistics 59:5 ► pp. 1179 ff.
Winter, Bodo & Martine Grice
2021. Independence and generalizability in linguistics. Linguistics 59:5 ► pp. 1251 ff.
Bernaisch, Tobias
2020. Introduction. In Gender in World Englishes, ► pp. 1 ff.
Bernaisch, Tobias
2022. Comparing Generalised Linear Mixed-Effects Models, Generalised Linear Mixed-Effects Model Trees and Random Forests. In Data and Methods in Corpus Linguistics, ► pp. 163 ff.
De Sutter, Gert & Marie-Aude Lefer
2020. On the need for a new research agenda for corpus-based translation studies: a multi-methodological, multifactorial and interdisciplinary approach. Perspectives 28:1 ► pp. 1 ff.
Gries, Stefan Th. & Philip Durrant
2020. Analyzing Co-occurrence Data. In A Practical Handbook of Corpus Linguistics, ► pp. 141 ff.
Myles, Florence
2020. Commentary: An SLA Perspective on Learner Corpus Research. In Learner Corpus Research Meets Second Language Acquisition, ► pp. 258 ff.
2020. Using Syntactic Co-occurrences to Trace Phraseological Complexity Development in Learner Writing: Verb + Object Structures in LONGDALE. In Learner Corpus Research Meets Second Language Acquisition, ► pp. 122 ff.
2020. A diachronic analysis of the adjective intensifierwellfrom Early Modern English to Present Day English. Canadian Journal of Linguistics/Revue canadienne de linguistique 65:2 ► pp. 216 ff.
Wulff, Stefanie & Stefan Th. Gries
2019. Particle Placement in Learner Language. Language Learning 69:4 ► pp. 873 ff.
Wulff, Stefanie & Stefan Th. Gries
2020. Exploring Individual Variation in Learner Corpus Research: Methodological Suggestions. In Learner Corpus Research Meets Second Language Acquisition, ► pp. 191 ff.
This list is based on CrossRef data as of 22 november 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.