This text offers an introduction to binary logistic regression, a confirmatory technique for statistically modelling the effect of one or several predictors on a binary response variable. It is explained why logistic regression is exceptionally well suited for the comparison of near-synonyms in corpus data; the technique allows the researcher to identify the different factors that have an impact on the choice between near synonyms, and to tease apart their respective effects. Moreover, the technique is well suited to deal with the type of unbalanced data sets that are typical of Corpus Linguistics. First, we describe in which contexts logistic regression is applicable and we give examples of the types of research questions for which it is an appropriate tool. Next, we explain why and how logistic regression analysis is different from linear regression analysis and we illustrate how the output of logistic regression analysis can be interpreted, using the study of an alternation pattern in Dutch as our example. The R code used in the case study is explained in detail and an URL is given from which R code and data sets can be downloaded. Finally, suggestions for further reading are given.
Arnold, J., Wasow, Th., Losongco, A., & Ginstrom, R
(2000) Heaviness vs. newness: The effects of complexity and information structure on constituent ordering. Language, 76, 28–55.
Berkson, J
(1944) Application of the logistic function to bio-assay. Journal of the American Statistical Association, 39, 357–365.
Cedergren, H., & Sankoff, D
(1974) Variable rules: Performance as a statistical reflection of competence. Language, 50, 33–56.
Cox, D.R
(1969) The analysis of binary data. London: Chapman and Hall.
Fox, J
(2003) Effect displays in R for generalised linear models. Journal of Statistical Software, 8(15), 1–27. Retrieved from [URL].
Grondelaers, S., Speelman, D., & Geeraerts, D
(2002) Regressing on er. Statistical analysis of texts and language variation. In A. Morin, & P. Sébillot (Eds.), 6èmes journées internationales d’analyse statistique des données textuelles (pp. 335–346). Rennes: Institut National de Recherche en Informatique et en Automatique.
Harrell, F.E
(2001) Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. Berlin: Springer.
(2000) Applied logistic regression (2nd ed.). New York: Wiley.
Johnson, D.E
(2008) Getting off the GoldVarb standard: Introducing Rbrul for mixed-effects variable rule analysis. Language and Linguistics Compass, 3, 359–83.
Keune, K., Ernestus, M., van Hout, R., & Baayen, H
(2005) Social, geographical, and register variation in Dutch: From written mogelijk to spoken mok. Corpus Linguistics and Linguistic Theory, 1, 183–223.
Nelder, J., & Wedderburn, R
(1972) Generalized linear models. Journal of the Royal Statistical Society: Series A, 135, 370–384.
Oostdijk, N
(2000) The spoken Dutch corpus: Overview and first evaluation. In S. Markantontou, S. Piperidis, & G. Stainhauoer (Eds.), Proceedings of the second international conference on language resources and evaluation (pp. 887–893). Athens: Institute for Language and Speech Processing.
Pampel, F.C
(2000) Logistic regression: A primer. Thousand Oaks, CA: Sage.
Paolillo, J
(2002) Analyzing linguistic variation: Statistical models and methods. Stanford: CSLI.
Sankoff, D
(1988) Variable rules. In U. Ammon, N. Dittmar, & K.J. Mathheier (Eds.), Berlin sociolinguistics: An international handbook of the science of language and society, Vol. 2.(pp. 984–997). Berlin & New York: Walter de Gruyter.
Sankoff, D., Tagliamonte, S., & Smith, E
(2005) Goldvarb X: A variable rule application for Macintosh and Windows . Department of Linguistics, University of Toronto.
Tagliamonte, S.A
(2006) Analysing sociolinguistic variation. Cambridge: Cambridge University Press.
Williams, R.S
(1994) A statistical analysis of English double object alternation. Issues in Applied Linguistics, 5, 37–58.
Wilson, E.B., & Worcester, J
(1943) The determination of L. D. 50 and its sampling error in bio-assay. Proceedings of the National Academy of Sciences, 29, 257–262.
Cited by
Cited by 28 other publications
Babu, C. Ganesh, M. Gowri Shankar, G. S. Priyanka & B. Vidhya
2023. SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022) [SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022), 2725], ► pp. 020008 ff.
Ferreira, Tiago S., Ewaldo E. C. Santana, Antônio F. L. Jacob Junior, Paulo F. Silva Junior, Luciana S. Bastos, Ana L. A. Silva, Solange A. Melo, Carlos A. M. Cruz, Vivianne S. Aquino, Luís S. O. Castro, Guilherme O. Lima & Raimundo C. S. Freire
2022. Diagnostic Classification of Cases of Canine Leishmaniasis Using Machine Learning. Sensors 22:9 ► pp. 3128 ff.
FONTEYN, LAUREN & NIKKI VAN DE POL
2016. Divide and conquer: the formation and functional dynamics of the Modern Englishing-clause network. English Language and Linguistics 20:2 ► pp. 185 ff.
Franco, Karlien & Sali A. Tagliamonte
2020. New -way(s)with -ward(s): lexicalization, splitting and sociolinguistic patterns. Language Variation and Change 32:2 ► pp. 217 ff.
Franco, Karlien & Sali A. Tagliamonte
2021. InterestingFellowor Tough OldBird?. American Speech 96:2 ► pp. 192 ff.
2022. Lectal contamination. International Journal of Corpus Linguistics 27:3 ► pp. 259 ff.
PIJPOPS, DIRK, DIRK SPEELMAN, STEFAN GRONDELAERS & FREEK VAN DE VELDE
2018. Comparing explanations for the Complexity Principle: evidence from argument realization. Language and Cognition 10:3 ► pp. 514 ff.
Pijpops, Dirk, Dirk Speelman, Freek Van de Velde & Stefan Grondelaers
2021. Incorporating the multi-level nature of the constructicon into hypothesis testing. Cognitive Linguistics 32:3 ► pp. 487 ff.
Pijpops, Dirk, Dirk Speelman & Antal van den Bosch
2022. Generating hypotheses for alternations at low and intermediate levels of schematicity. The use of Memory-based Learning. Linguistics Vanguard 8:1 ► pp. 305 ff.
Podhorodecka, Joanna
2021. Real-life pseudo-passives: The usage and discourse functions of adjunct-based passive constructions. Poznan Studies in Contemporary Linguistics 57:1 ► pp. 33 ff.
Rajaguru, Harikumar, M. Gowri Shankar, S. Mohammed Irfan & C. Mukesh Balaji
2023. SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022) [SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022), 2725], ► pp. 020003 ff.
Rajaguru, Harikumar, M. Gowri Shankar, S. P. Nanthakumar & I. Arul Murugan
2023. SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022) [SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022), 2725], ► pp. 020002 ff.
Rajaguru, Harikumar, M. Gowri Shankar, S. P. Nanthakumar & I. Arul Murugan
2023. SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022) [SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022), 2725], ► pp. 020012 ff.
Silva, Douglas, Sergio T. Carvalho & Nadia Silva
2022. Comparative Analysis of Classification Algorithms Applied to Circular Trading Prediction Scenarios. In Electronic Government and the Information Systems Perspective [Lecture Notes in Computer Science, 13429], ► pp. 95 ff.
TIZÓN-COUTO, DAVID
2022. A multivariate account of particle alternation after bare-form try in native varieties of English. English Language and Linguistics 26:4 ► pp. 645 ff.
Tizón-Couto, David & David Lorenz
2021. Variables are valuable: making a case for deductive modeling. Linguistics 59:5 ► pp. 1279 ff.
Van de Velde, Freek & Dirk Pijpops
2021. Investigating Lexical Effects in Syntax with Regularized Regression (Lasso). Journal of Research Design and Statistics in Linguistics and Communication Science 6:2
This list is based on CrossRef data as of 12 may 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.