Chapter in:Broadening the Spectrum of Corpus Linguistics: New approaches to variability and change
Edited by Susanne Flach and Martin Hilpert
[Studies in Corpus Linguistics 105] 2022
► pp. 257–284
MuPDAR for corpus-based learner and variety studies
Two (more) suggestions for improvement
Corpus-based studies of learner language and (especially) English varieties have become more quantitative in nature and increasingly use regression-based methods and classifiers such as classification trees, random forests, etc. One recent development that is becoming more widely used is the MuPDAR (Multifactorial Prediction and Deviation Analysis using Regressions) approach of Gries and Deshors (2014) and Gries and Adelman (2014). This approach attempts to improve on traditional regression- or tree-based approaches by, firstly, training a model/classifier on the reference speakers (often native speakers in learner corpus studies or British English speakers in variety studies), then, secondly, using this model/classifier to predict what such a reference speaker would produce in the situation the target speaker is in (often non-native speakers or indigenized-variety speakers). The third step then consists of determining whether the target speakers made a canonical choice or not and explore that variability with a second regression model or classifier. The present paper is a follow-up to Gries and Deshors’s (2020) and offers additional answers to a variety of questions that readers and audiences to MuPDAR presentations have been raising for a few years. First, I show how MuPDAR can be extended straightforwardly to alternations that involve more than the typically used binary choices; I do so in a way that also addresses another potential challenge and exemplify this with a case study from varieties research. Second, I outline a casewise-similarity approach towards predicting what reference speakers would do that avoids frequent regression modeling problems and exemplify, as well as compare, it to competing alternatives with a case study from learner corpus research.
Keywords: corpus-based alternation research, learner corpus research, variety research, MuPDAR, predictive modeling
- 1.1General introduction
- 1.2Motivation of the present paper
- 2.Case study 1: The dative and voice alternation across varieties
- 2.2MuPDAR: Steps i and ii
- 2.3MuPDAR: Step iii for a multinomial context
- 2.3.1The simple version
- 2.3.2The better version
- 2.4MuPDAR: Step iv for a multinomial context
- 2.5Interim conclusion
- 3.Case study 2: The dative alternation by learners
- 3.2The proposed classifier
- 3.3.1Prediction accuracies without and with ‘either’ cases
- 3.3.2Comparison and validation
- 4.Concluding remarks
This content is being prepared for publication; it may be subject to changes.
Baayen, R. Harald & Ramscar, Michael
Bernaisch, Tobias, Gries, Stefan Th., & Mukherjee, Joybrato
Boulesteix, Anne-Laure, Janitza, Silke, Hapfelmeier, Alexander, Van Steen, Kristel & Strobl, Carolin
Daelemans, Walter, Zavrel, Jakub, van der Sloot, Ko, & van den Bosch, Antal
2018 TiMBL: Tilburg Memory-Based Learner. Version 6.4 Reference Guide. ILK Technical Report – ILK 11–01. https://github.com/LanguageMachines/timbl/raw/master/docs/Timbl_6.4_Manual.pdf (4 April 2022).
Deshors, Sandra C.
Deshors, Sandra C. & Gries, Stefan Th.
Deshors, Sandra C. & Gries, Stefan Th
Divjak, Dagmar S., Arppe, Antti & Dąbrowska, Ewa
Gower, J. C.
Gries, Stefan Th
Gries, Stefan Th. & Adelman, Allison S.
Gries, Stefan Th. & Deshors, Sandra C.
Heller, Benedikt, Bernaisch, Tobias, & Gries, Stefan Th
Klavan, Jane & Divjak, Dagmar S.
Kolbe-Hanna, Daniela & Baldus, Lina
2018 The choice between -ing and to complement clauses in English as first, second and foreign language. Paper presented at ICAME 39, University of Tampere.
Kruger, Haidee & De Sutter, Gert
Lester, Nicholas A.
Milin, Petar, Divjak, Dagmar S., Dimitrijević, Strahinja & Baayen, R. Harald
Werner, Valentin, Fuchs, Robert, & Götz, Sandra
Wright, Marvin N., Ziegler, Andreas, & König, Inke R.
Wulff, Stefanie & Gries, Stefan Th