Spurious effects in variational corpus linguistics
Identification and implications of confounding
As repositories of spontaneously realized language, corpora generally have an uncontrolled and unbalanced structure where all variables operate simultaneously. Consequently, a variable’s real effect can be concealed when studied in isolation because of the exclusion of the impact of other potentially confounding variables. Analyzing a variational case study, the alternation between inflected and uninflected attributive adjectives in Dutch, it will be demonstrated how confounding variables alter the impact of explanatory variables on the response variable, resulting in spurious effects in the bivariate analyses. Multiple Correspondence Analysis will be used as a heuristic tool to unveil the association patterns between explanatory variables in the data matrix which induce the spurious effects. Based on these findings, we will argue for a thorough analysis of the database patterns to gain insight in the underlying associations between explanatory variables before modeling their real impact on the response variable in a multivariate model.
Keywords: confounding, Multiple Correspondence Analysis, variational linguistics, spurious effects
Published online: 25 October 2014
Arppe, A., Gilquin, G., Glynn, D., Hilpert, M. & Zeschel, A.
Arppe, A. & Järvikivi, J.
Bresnan, J., Cueni, A., Nikitina, T. & Baayen, R.H.
Curley, S.P. & Browne, G.J.
Daelemans, W. & Bosch, A. van den
Geeraerts, D., Kristiansen, G. & Peirsman, Y.
Greenland, S., Robins, J.M. & Pearl, J.
Gries, S. Th
Gries, S. Th. & Hilpert, M.
Grondelaers, S. & Speelman, D.
Haeseryn, W., Romijn, K., Geerts, G., Rooij, J. de & Toorn, M.C. van den
Heylen, K. & Speelman, D.
Heylen, K., Tummers, J. & Geeraerts, D.
Lebrun, Y. & Schurmans-Swillen, G.
Lipovetsky, S. & Conklin, W.M.
Nenadic, O. & Greenacre, M.J.
2007 “Correspondence analysis in R, with two- and three-dimensional graphics: The ca package”. Journal of Statistical Software, 20 (3). Available at: http://www.jstatsoft.org/v20/io3/paper (accessed June 2014).
R Development Core Team
2012 R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Available at: www.R-project.org (accessed June 2014).
Rietveld, T. & Hout, R. van
Rousseau, P. & Sankoff, D.
1999 “Simpson’s paradox and Cornfield’s conditions”. ASA-JSM, Proceedings of the Section of Statistical Education , 106–111.
Speelman, D., Grondelaers, S. & Geeraerts, D.
Tagliamonte, S. & Baayen, R.H.
Tu, Y.-K., Gunnell, D. & Gilthorpe, M.
2005 Het Naakte Adjectief. Kwantitatief-empirisch Onderzoek naar de Adjectivische Buigingsalternantie bij Neutra. Unpublished doctoral dissertation, KU Leuven, Belgium.
Tummers, J., Heylen, K. & Geeraerts, D.
Woods, A., Fletcher, P. & Hughes, A.
Cited by 1 other publications
Tummers, Jose, Dirk Speelman, Kris Heylen & Dirk Geeraerts
This list is based on CrossRef data as of 28 august 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.