Regression analysis

Table of contents

For a long time, a family of statistical methods that fall under the umbrella term of regression analysis has been used routinely as a means to make quantitatively motivated inferences on research data. While systematic comparison between regression analysis and other kinds of statistical techniques goes beyond the scope of this chapter, there are a number of inter-related benefits that support the applicability of regression analysis in the study of pragmatics. First, given that certain general criteria have been considered, it provides reliable and robust results in a reproducible fashion. Second, once familiar with the basic logic underlying regression analysis, the results are relatively easy and straight-forward to interpret. Third, regression analysis is very flexible in the sense that a similar research design with similar logic of reasoning can be applied to a range of research questions and to different types of variables. Fourth, and tying up the aforementioned, regression analyses have become widely used, and so the information provided by studies that make use of such techniques are easily accessible to a wide audience and make the results of different studies easier to compare, ultimately contributing positively to the transparency and the very cumulative nature of the scientific method. (For linguistically oriented discussions on the benefits of various forms of regression analyses, see e.g. Jaeger 2008; Johnson 2009; Tagliamonte and Baayen 2012; Gries 2015; Klavan and Divjak 2016; Plonsky and Oswald 2017.)

Full-text access is restricted to subscribers. Log in to obtain additional credentials. For subscription information see Subscription & Price.


Baayen, Harald
2008Analyzing linguistic data: A practical introduction. Cambridge and New York: Cambridge University Press. DOI logoGoogle Scholar
Baayen, R. Harald, Doug J. Davidson and Douglas M. Bates
2008 “Mixed-effects modeling with crossed random effects for subjects and items”. Journal of Memory and Language 59 (4): 390–412. DOI logoGoogle Scholar
Biber, Douglas
2012 “Register as a predictor of linguistic variation.” Corpus Linguistics and Linguistic Theory 8 (1): 9–37. DOI logoGoogle Scholar
2014 “Using multi-dimensional analysis to explore cross-linguistic universals of register variation.” Languages in Contrast 14 (1): 7–34. DOI logoGoogle Scholar
Cangemi, Francesco, Martina Krüger and Martina Grice
2015 “Listener-specific perception of speaker-specific production in intonation.” In Individual Differences in Speech Production and Perception, ed. by Susanne Fuchs, Daniel Pape, Caterina Petrone and Pascal Perrier, 123–145. Frankfurt: Peter Lang.Google Scholar
Čermák, František and Alexandr Rosen
2012 “The case of InterCorp, a multilingual parallel corpus.” International Journal of Corpus Linguistics 17 (3): 411–427. DOI logoGoogle Scholar
Council of Europe
2001Common European framework of reference for languages: Learning, teaching, assessment. Cambridge: Cambridge University Press.Google Scholar
Ellis, Nick C.
2016 “Salience, Cognition, Language Complexity, and Complex Adaptive Systems.” Studies in second language acquisition 38 (2): 341–351. DOI logoGoogle Scholar
Forstmeier, Wolfgang and Holger Schielzeth
2011 “Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner’s curse.” Behavioral ecology and sociobiology 65 (1): 47–55. Berlin/Heidelberg: Springer. DOI logoGoogle Scholar
Gries, Stefan Th
2015 “The most under-used statistical method in corpus linguistics: Multi-level (and mixed-effects) models.” Corpora 10 (1): 95–125. DOI logoGoogle Scholar
Gries, Stefan Th.
2021aStatistics for Linguistics with R. 3rd edition. Berlin: De Gruyter Mouton. DOI logoGoogle Scholar
2021b “(Generalized Linear) Mixed-Effects Modeling: A Learner Corpus Example.” Language Learning. Ahead-of-press. DOI logo (19 March 2021).Google Scholar
Hakulinen, Auli, Maria Vilkuna, Riitta Korhonen, Vesa Koivisto, Tarja Riitta Heinonen and Irja Alho
2004Iso suomen kielioppi. Helsinki: Suomalaisen kirjallisuuden seura. http://​scripta​.kotus​.fi​/viskURN:ISBN:978​-952​-5446​-35​-7
Hout, Roeland van
1994 “Statistics.” Handbook of Pragmatics: Manual. Amsterdam: John Benjamins. Handbook of Pragmatics Online: DOI logoGoogle Scholar
Ivaska, Ilmari
2014 “The Corpus of Advanced Learner Finnish (LAS2): Database and toolkit to study academic learner Finnish.” Apples – Journal of Applied Language Studies 8(3). 21–38.Google Scholar
2015 “Longitudinal changes in academic learner Finnish: A key structure analysis.” International Journal of Learner Corpus Research 1 (2): 210–241. DOI logoGoogle Scholar
Ivaska, Ilmari, Markku Nikulin and Elisa Reunanen
2021The Corpus of Academic Finnish. Turku: University of Turku.Google Scholar
Jaeger, T. Florian
2008 “Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models.” Journal of memory and language 59 (4): 434–446. DOI logoGoogle Scholar
Jaeger, T. Florian, Peter Graff, William Croft and Daniel Pontillo
2011 “Mixed effect models for genetic and areal dependencies in linguistic typology.” Linguistic Typology 15 (2): 281–320. DOI logoGoogle Scholar
Jantunen, Jarmo
2011 “Kansainvälinen oppijansuomen korpus (ICLFI): typologia, taustamuuttujat ja annotointi.” Lähivõrdlusi. Lähivertailuja 21: 86–105. DOI logoGoogle Scholar
Johnson, Daniel Ezra
2009 “Getting off the GoldVarb Standard: Introducing Rbrul for Mixed-Effects Variable Rule Analysis.” Language and Linguistics Compass 3 (1): 359–383. DOI logoGoogle Scholar
Kenny, David A. and Charles M. Judd
1986 “Consequences of violating the independence assumption in analysis of variance.” Psychological Bulletin 99 (3): 422–431. DOI logoGoogle Scholar
Klavan, Jane and Dagmar Divjak
2016 “The cognitive plausibility of statistical classification models: Comparing textual and behavioral evidence.” Folia linguistica 50 (2): 355–384. DOI logoGoogle Scholar
Larsson, Tove, Luke Plonsky and Gregory R. Hancock
2020 “On the benefits of structural equation modeling for corpus linguists.” Corpus Linguistics and Linguistic Theory Ahead-of-print. DOI logo. DOI logoGoogle Scholar
Mauranen, Anna
2000 “Strange strings in translated language: A study on corpora.” In Intercultural Faultlines: Research Models in Translation Studies, ed. by Maeve Olohan, 119–141. Manchester: St Jerome Publishing.Google Scholar
Mundry, Roger and Charles L. Nunn
2009Stepwise Model Fitting and Statistical Inference: Turning Noise into Signal Pollution. The American naturalist 173 (1): 119–123. DOI logoGoogle Scholar
Norouzian, Reza, Michael de Miranda and Luke Plonsky
2018 “The Bayesian Revolution in Second Language Research: An Applied Approach.” Language Learning 68 (4): 1032–1075. DOI logoGoogle Scholar
Norris, John M.
2015 “Statistical Significance Testing in Second Language Research: Basic Problems and Suggestions for Reform.” Language Learning 65 (S1): 97–126. DOI logoGoogle Scholar
Pallaskallio, Ritva
2003 “Uutisaika. Finiittiverbin aikamuodoista katastrofiuutisissa 1892–1994.” Virittäjä 107 (1): 27–45. https://​journal​.fi​/virittaja​/article​/view​/40236 (15 April 2021).
Plonsky, Luke and Frederick L. Oswald
2017 “Multiple regression as a flexible alternative to anova in L2 research.” Studies in Second Language Acquisition 39 (3): 579–592. DOI logoGoogle Scholar
R Core Team
2018R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://​www​.R​-project​.org/
Roettger, Timo B.
2019 “Researcher degrees of freedom in phonetic research.” Laboratory phonology 10 (1): 1–27. DOI logoGoogle Scholar
Röthlisberger, Melanie, Jason Grafmiller and Benedikt Szmrecsanyi
2017 “Cognitive indigenization effects in the English dative alternation.” Cognitive Linguistics 28(4): 673–710. DOI logoGoogle Scholar
Serlin, Ronald C. and Joel R. Levin
1985 “Teaching How to Derive Directly Interpretable Coding Schemes for Multiple Regression Analysis.” Journal of Educational Statistics 10 (3): 223–238. DOI logoGoogle Scholar
Simmons, Joseph P., Leif D. Nelson and Uri Simonsohn
2011 “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological science 22 (11): 1359–1366. DOI logoGoogle Scholar
Szmrecsanyi, Benedikt
2019 “Register in variationist linguistics.” Register Studies 1 (1): 76–99. DOI logoGoogle Scholar
Tagliamonte, Sali A. and R. Harald Baayen
2012 “Models, forests and trees of York English: Was/were variation as a case study for statistical practice.” Language Variation and Change 24: 135–178. DOI logoGoogle Scholar
Whittingham, Mark J., Philip A. Stephens, Richard B. Bradbury and Robert P. Freckleton
2006 “Why Do We Still Use Stepwise Modelling in Ecology and Behaviour?The Journal of Animal Ecology 75 (5): 1182–1189. DOI logoGoogle Scholar
Wieling, Martijn, John Nerbonne and R. Harald Baayen
2011 “Quantitative Social Dialectology: Explaining Linguistic Variation Geographically and Socially.” PLoS ONE 6(9). e23613. DOI logoGoogle Scholar
Winter, Bodo
2020Statistics for Linguists: An introduction using R. London: Routledge. DOI logoGoogle Scholar
Winter, Bodo and Martine Grice
2021 “Independence and generalizability in linguistics”. Linguistics 59 (5): 1251–1277. DOI logoGoogle Scholar