A confirmatory technique for comparisons in corpus linguistics: Logistic regression

Speelman, Dirk

doi:10.1075/hcp.43.18spe

Part of

Corpus Methods for Semantics: Quantitative studies in polysemy and synonymy
Edited by Dylan Glynn and Justyna A. Robinson
[Human Cognitive Processing 43] 2014
► pp. 487–533

Logistic regression

A confirmatory technique for comparisons in corpus linguistics

Dirk Speelman | University of Leuven

This text offers an introduction to binary logistic regression, a confirmatory technique for statistically modelling the effect of one or several predictors on a binary response variable. It is explained why logistic regression is exceptionally well suited for the comparison of near-synonyms in corpus data; the technique allows the researcher to identify the different factors that have an impact on the choice between near synonyms, and to tease apart their respective effects. Moreover, the technique is well suited to deal with the type of unbalanced data sets that are typical of Corpus Linguistics. First, we describe in which contexts logistic regression is applicable and we give examples of the types of research questions for which it is an appropriate tool. Next, we explain why and how logistic regression analysis is different from linear regression analysis and we illustrate how the output of logistic regression analysis can be interpreted, using the study of an alternation pattern in Dutch as our example. The R code used in the case study is explained in detail and an URL is given from which R code and data sets can be downloaded. Finally, suggestions for further reading are given.

Keywords: confirmatory statistics, outcome prediction, statistical modelling

Published online: 6 November 2014

https://doi.org/10.1075/hcp.43.18spe

References

Arnold, J., Wasow, Th., Losongco, A., & Ginstrom, R

(2000) Heaviness vs. newness: The effects of complexity and information structure on constituent ordering. Language , 76, 28–55.

Berkson, J

(1944) Application of the logistic function to bio-assay. Journal of the American Statistical Association , 39, 357–365.

Cedergren, H., & Sankoff, D

(1974) Variable rules: Performance as a statistical reflection of competence. Language , 50, 33–56.

Cox, D.R

(1969) The analysis of binary data . London: Chapman and Hall.

Fox, J

(2003) Effect displays in R for generalised linear models. Journal of Statistical Software , 8(15), 1–27. Retrieved from [URL].

Grondelaers, S., Speelman, D., & Geeraerts, D

(2002) Regressing on er. Statistical analysis of texts and language variation. In A. Morin, & P. Sébillot (Eds.), 6èmes journées internationales d’analyse statistique des données textuelles (pp. 335–346). Rennes: Institut National de Recherche en Informatique et en Automatique.

Harrell, F.E

(2001) Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis . Berlin: Springer.

Hilbe, J.M

(2009) Logistic regression models . London: Chapman & Hall/CRC Press.

Hosmer, D., & Lemeshow, S

(2000) Applied logistic regression (2^nd ed.). New York: Wiley.

Johnson, D.E

(2008) Getting off the GoldVarb standard: Introducing Rbrul for mixed-effects variable rule analysis. Language and Linguistics Compass , 3, 359–83.

Keune, K., Ernestus, M., van Hout, R., & Baayen, H

(2005) Social, geographical, and register variation in Dutch: From written mogelijk to spoken mok . Corpus Linguistics and Linguistic Theory , 1, 183–223.

Nelder, J., & Wedderburn, R

(1972) Generalized linear models. Journal of the Royal Statistical Society: Series A , 135, 370–384.

Oostdijk, N

(2000) The spoken Dutch corpus: Overview and first evaluation. In S. Markantontou, S. Piperidis, & G. Stainhauoer (Eds.), Proceedings of the second international conference on language resources and evaluation (pp. 887–893). Athens: Institute for Language and Speech Processing.

Pampel, F.C

(2000) Logistic regression: A primer . Thousand Oaks, CA: Sage.

Paolillo, J

(2002) Analyzing linguistic variation: Statistical models and methods . Stanford: CSLI.

Sankoff, D

(1988) Variable rules. In U. Ammon, N. Dittmar, & K.J. Mathheier (Eds.), Berlin sociolinguistics: An international handbook of the science of language and society , Vol. 2.(pp. 984–997). Berlin & New York: Walter de Gruyter.

Sankoff, D., Tagliamonte, S., & Smith, E

(2005) Goldvarb X: A variable rule application for Macintosh and Windows . Department of Linguistics, University of Toronto.

Tagliamonte, S.A

(2006) Analysing sociolinguistic variation . Cambridge: Cambridge University Press.

Williams, R.S

(1994) A statistical analysis of English double object alternation. Issues in Applied Linguistics , 5, 37–58.

Wilson, E.B., & Worcester, J

(1943) The determination of L. D. 50 and its sampling error in bio-assay. Proceedings of the National Academy of Sciences , 29, 257–262.

Cited by

Cited by 34 other publications

Order by:

Babu, C. Ganesh, M. Gowri Shankar, G. S. Priyanka & B. Vidhya

2023. SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022) [SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022), 2725], ► pp. 020008 ff.

Claes, Jeroen

2017. Cognitive and geographic constraints on morphosyntactic variation. Belgian Journal of Linguistics 31 ► pp. 30 ff.

Davey, Kira & Danielle Barth

2023. Directional constructions in Matukar Panau. Asia-Pacific Language Variation 9:2 ► pp. 156 ff.

De Smet, Isabeau & Freek Van de Velde

2019. Reassessing the evolution of West Germanic preterite inflection. Diachronica 36:2 ► pp. 139 ff.

Donaldson, Bryan

2017. Negation in Near‐Native French: Variation and Sociolinguistic Competence. Language Learning 67:1 ► pp. 141 ff.

Donaldson, Bryan

2020. Clitic position in Old Occitan affirmative verb-first declaratives coordinated bye. Journal of Historical Linguistics 10:3 ► pp. 389 ff.

Ferreira, Tiago S., Ewaldo E. C. Santana, Antônio F. L. Jacob Junior, Paulo F. Silva Junior, Luciana S. Bastos, Ana L. A. Silva, Solange A. Melo, Carlos A. M. Cruz, Vivianne S. Aquino, Luís S. O. Castro, Guilherme O. Lima & Raimundo C. S. Freire

2022. Diagnostic Classification of Cases of Canine Leishmaniasis Using Machine Learning. Sensors 22:9 ► pp. 3128 ff.

FONTEYN, LAUREN & NIKKI VAN DE POL

2016. Divide and conquer: the formation and functional dynamics of the Modern Englishing-clause network. English Language and Linguistics 20:2 ► pp. 185 ff.

Franco, Karlien & Sali A. Tagliamonte

2020. New -way(s) with -ward(s): lexicalization, splitting and sociolinguistic patterns. Language Variation and Change 32:2 ► pp. 217 ff.

Franco, Karlien & Sali A. Tagliamonte

2021. InterestingFellowor Tough OldBird?. American Speech 96:2 ► pp. 192 ff.

Glynn, Dylan & Olaf Mikkelsen

2024. Concrete constructions or messy mangroves? How modelling contextual effects on constructional alternations reflect theoretical assumptions of language structure. Linguistics Vanguard 0:0

Granvik, Anton

2017. Accounting for syntactic variation in diachrony. Belgian Journal of Linguistics 31 ► pp. 243 ff.

Heng, Tianyu, Dezhi Yang, Ruonan Wang, Li Zhang, Yang Lu & Guanhua Du

2021. Progress in Research on Artificial Intelligence Applied to Polymorphism and Cocrystal Prediction. ACS Omega 6:24 ► pp. 15543 ff.

Hirota, Harunobu

2022. The Indicative/subjunctive Mood Alternation with Adverbs of Doubt in Spanish. Journal of Quantitative Linguistics 29:4 ► pp. 450 ff.

Krawczak, Karolina

2022. Chapter 11. Modeling constructional variation. In Analogy and Contrast in Language [Human Cognitive Processing, 73], ► pp. 341 ff.

Ma, Guanghui, Rajendran Parthiban & Nemai Karmakar

2022. 2022 IEEE Symposium on Computers and Communications (ISCC), ► pp. 1 ff.

Marine, Buzuneh & Dagne Mengistie

2024. An Analysis of Various Factors Underlying Covid-19 Prevention Practice and Strategy in Jigjiga Town, Northeast Ethiopia. Infection and Drug Resistance Volume 17 ► pp. 187 ff.

Nguyen, Allison, Tom Roberts, Pranav Anand & Jean E Fox Tree

2022. Look, Dude: How hyperpartisan and non-hyperpartisan speech differ in online commentary. Discourse & Society 33:3 ► pp. 371 ff.

Oyebola, Folajimi & Warsa Melles

2023. Question intonation patterns in Nigerian English. In New Englishes, New Methods [Varieties of English Around the World, G68], ► pp. 108 ff.

Pijpops, Dirk

2022. Lectal contamination. International Journal of Corpus Linguistics 27:3 ► pp. 259 ff.

PIJPOPS, DIRK, DIRK SPEELMAN, STEFAN GRONDELAERS & FREEK VAN DE VELDE

2018. Comparing explanations for the Complexity Principle: evidence from argument realization. Language and Cognition 10:3 ► pp. 514 ff.

Pijpops, Dirk, Dirk Speelman, Freek Van de Velde & Stefan Grondelaers

2021. Incorporating the multi-level nature of the constructicon into hypothesis testing. Cognitive Linguistics 32:3 ► pp. 487 ff.

Pijpops, Dirk, Dirk Speelman & Antal van den Bosch

2022. Generating hypotheses for alternations at low and intermediate levels of schematicity. The use of Memory-based Learning. Linguistics Vanguard 8:1 ► pp. 305 ff.

Podhorodecka, Joanna

2021. Real-life pseudo-passives: The usage and discourse functions of adjunct-based passive constructions. Poznan Studies in Contemporary Linguistics 57:1 ► pp. 33 ff.

Rajaguru, Harikumar, M. Gowri Shankar, S. Mohammed Irfan & C. Mukesh Balaji

2023. SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022) [SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022), 2725], ► pp. 020003 ff.

Rajaguru, Harikumar, M. Gowri Shankar, S. P. Nanthakumar & I. Arul Murugan

2023. SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022) [SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022), 2725], ► pp. 020002 ff.

Rajaguru, Harikumar, M. Gowri Shankar, S. P. Nanthakumar & I. Arul Murugan

2023. SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022) [SECOND INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, SYSTEMS AND SECURITIES (ICCSSS - 2022), 2725], ► pp. 020012 ff.

Romine, Samuel, Joshua Jensen & Robert Ball

2023. Comparing Sentiment Analysis and Emotion Analysis of Algorithms vs. People. In Artificial Intelligence in HCI [Lecture Notes in Computer Science, 14051], ► pp. 167 ff.

Silva, Douglas, Sergio T. Carvalho & Nadia Silva

2022. Comparative Analysis of Classification Algorithms Applied to Circular Trading Prediction Scenarios. In Electronic Government and the Information Systems Perspective [Lecture Notes in Computer Science, 13429], ► pp. 95 ff.

SUGAWARA, Yuki & Kazuho KAMBARA

2023. <i>The Many Uses of Explain:</i>. Annals of the Japan Association for Philosophy of Science 32:0 ► pp. 23 ff.

TIZÓN-COUTO, DAVID

2022. A multivariate account of particle alternation after bare-formtryin native varieties of English. English Language and Linguistics 26:4 ► pp. 645 ff.

Tizón-Couto, David & David Lorenz

2021. Variables are valuable: making a case for deductive modeling. Linguistics 59:5 ► pp. 1279 ff.

Van de Velde, Freek & Dirk Pijpops

2021. Investigating Lexical Effects in Syntax with Regularized Regression (Lasso). Journal of Research Design and Statistics in Linguistics and Communication Science 6:2

[no author supplied]

2021. Nominal and Pronominal Address in Jamaica and Trinidad [Topics in Address Research, 3],

This list is based on CrossRef data as of 19 march 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.