For a long time, a family of statistical methods that fall under the umbrella term of regression analysis has been used routinely as a means to make quantitatively motivated inferences on research data. While systematic comparison between regression analysis and other kinds of statistical techniques goes beyond the scope of this chapter, there are a number of inter-related benefits that support the applicability of regression analysis in the study of pragmatics. First, given that certain general criteria have been considered, it provides reliable and robust results in a reproducible fashion. Second, once familiar with the basic logic underlying regression analysis, the results are relatively easy and straight-forward to interpret. Third, regression analysis is very flexible in the sense that a similar research design with similar logic of reasoning can be applied to a range of research questions and to different types of variables. Fourth, and tying up the aforementioned, regression analyses have become widely used, and so the information provided by studies that make use of such techniques are easily accessible to a wide audience and make the results of different studies easier to compare, ultimately contributing positively to the transparency and the very cumulative nature of the scientific method. (For linguistically oriented discussions on the benefits of various forms of regression analyses, see e.g. Jaeger 2008; Johnson 2009; Tagliamonte and Baayen 2012; Gries 2015; Klavan and Divjak 2016; Plonsky and Oswald 2017.)
References
Baayen, Harald
2008Analyzing linguistic data: A practical introduction. Cambridge and New York: Cambridge University Press.
Baayen, R. Harald, Doug J. Davidson and Douglas M. Bates
2008 “Mixed-effects modeling with crossed random effects for subjects and items”. Journal of Memory and Language 59 (4): 390–412.
Biber, Douglas
2012 “Register as a predictor of linguistic variation.” Corpus Linguistics and Linguistic Theory 8 (1): 9–37.
Biber, Douglas
2014 “Using multi-dimensional analysis to explore cross-linguistic universals of register variation.” Languages in Contrast 14 (1): 7–34.
Cangemi, Francesco, Martina Krüger and Martina Grice
2015 “Listener-specific perception of speaker-specific production in intonation.” In Individual Differences in Speech Production and Perception, ed. by Susanne Fuchs, Daniel Pape, Caterina Petrone and Pascal Perrier, 123–145. Frankfurt: Peter Lang.
Čermák, František and Alexandr Rosen
2012 “The case of InterCorp, a multilingual parallel corpus.” International Journal of Corpus Linguistics 17 (3): 411–427.
Council of Europe
2001Common European framework of reference for languages: Learning, teaching, assessment. Cambridge: Cambridge University Press.
Ellis, Nick C.
2016 “Salience, Cognition, Language Complexity, and Complex Adaptive Systems.” Studies in second language acquisition 38 (2): 341–351.
Forstmeier, Wolfgang and Holger Schielzeth
2011 “Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner’s curse.” Behavioral ecology and sociobiology 65 (1): 47–55. Berlin/Heidelberg: Springer.
Gries, Stefan Th
2015 “The most under-used statistical method in corpus linguistics: Multi-level (and mixed-effects) models.” Corpora 10 (1): 95–125.
Gries, Stefan Th.
2021aStatistics for Linguistics with R. 3rd edition. Berlin: De Gruyter Mouton.
Gries, Stefan Th.
2021b “(Generalized Linear) Mixed-Effects Modeling: A Learner Corpus Example.” Language Learning. Ahead-of-press. (19March 2021).
Hakulinen, Auli, Maria Vilkuna, Riitta Korhonen, Vesa Koivisto, Tarja Riitta Heinonen and Irja Alho
1994 “Statistics.” Handbook of Pragmatics: Manual. Amsterdam: John Benjamins. Handbook of Pragmatics Online:
Ivaska, Ilmari
2014 “The Corpus of Advanced Learner Finnish (LAS2): Database and toolkit to study academic learner Finnish.” Apples – Journal of Applied Language Studies 8(3). 21–38.
Ivaska, Ilmari
2015 “Longitudinal changes in academic learner Finnish: A key structure analysis.” International Journal of Learner Corpus Research 1 (2): 210–241.
Ivaska, Ilmari, Markku Nikulin and Elisa Reunanen
2021The Corpus of Academic Finnish. Turku: University of Turku.
Jaeger, T. Florian
2008 “Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models.” Journal of memory and language 59 (4): 434–446.
Jaeger, T. Florian, Peter Graff, William Croft and Daniel Pontillo
2011 “Mixed effect models for genetic and areal dependencies in linguistic typology.” Linguistic Typology 15 (2): 281–320.
Jantunen, Jarmo
2011 “Kansainvälinen oppijansuomen korpus (ICLFI): typologia, taustamuuttujat ja annotointi.” Lähivõrdlusi. Lähivertailuja 21: 86–105.
Johnson, Daniel Ezra
2009 “Getting off the GoldVarb Standard: Introducing Rbrul for Mixed-Effects Variable Rule Analysis.” Language and Linguistics Compass 3 (1): 359–383.
Kenny, David A. and Charles M. Judd
1986 “Consequences of violating the independence assumption in analysis of variance.” Psychological Bulletin 99 (3): 422–431.
Klavan, Jane and Dagmar Divjak
2016 “The cognitive plausibility of statistical classification models: Comparing textual and behavioral evidence.” Folia linguistica 50 (2): 355–384.
Larsson, Tove, Luke Plonsky and Gregory R. Hancock
2020 “On the benefits of structural equation modeling for corpus linguists.” Corpus Linguistics and Linguistic Theory Ahead-of-print. .
Mauranen, Anna
2000 “Strange strings in translated language: A study on corpora.” In Intercultural Faultlines: Research Models in Translation Studies, ed. by Maeve Olohan, 119–141. Manchester: St Jerome Publishing.
Mundry, Roger and Charles L. Nunn
2009Stepwise Model Fitting and Statistical Inference: Turning Noise into Signal Pollution. The American naturalist 173 (1): 119–123.
Norouzian, Reza, Michael de Miranda and Luke Plonsky
2018 “The Bayesian Revolution in Second Language Research: An Applied Approach.” Language Learning 68 (4): 1032–1075.
Norris, John M.
2015 “Statistical Significance Testing in Second Language Research: Basic Problems and Suggestions for Reform.” Language Learning 65 (S1): 97–126.
2017 “Multiple regression as a flexible alternative to anova in L2 research.” Studies in Second Language Acquisition 39 (3): 579–592.
R Core Team
2018R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/
Roettger, Timo B.
2019 “Researcher degrees of freedom in phonetic research.” Laboratory phonology 10 (1): 1–27.
Röthlisberger, Melanie, Jason Grafmiller and Benedikt Szmrecsanyi
2017 “Cognitive indigenization effects in the English dative alternation.” Cognitive Linguistics 28(4): 673–710.
Serlin, Ronald C. and Joel R. Levin
1985 “Teaching How to Derive Directly Interpretable Coding Schemes for Multiple Regression Analysis.” Journal of Educational Statistics 10 (3): 223–238.
Simmons, Joseph P., Leif D. Nelson and Uri Simonsohn
2011 “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological science 22 (11): 1359–1366.
Szmrecsanyi, Benedikt
2019 “Register in variationist linguistics.” Register Studies 1 (1): 76–99.
Tagliamonte, Sali A. and R. Harald Baayen
2012 “Models, forests and trees of York English: Was/were variation as a case study for statistical practice.” Language Variation and Change 24: 135–178.
Whittingham, Mark J., Philip A. Stephens, Richard B. Bradbury and Robert P. Freckleton
2006 “Why Do We Still Use Stepwise Modelling in Ecology and Behaviour?” The Journal of Animal Ecology 75 (5): 1182–1189.
Wieling, Martijn, John Nerbonne and R. Harald Baayen
2011 “Quantitative Social Dialectology: Explaining Linguistic Variation Geographically and Socially.” PLoS ONE 6(9). e23613.
Winter, Bodo
2020Statistics for Linguists: An introduction using R. London: Routledge.
Winter, Bodo and Martine Grice
2021 “Independence and generalizability in linguistics”. Linguistics 59 (5): 1251–1277.