Chapter published in:
Urban Matters: Current approaches in variationist sociolinguisticsEdited by Arne Ziegler, Stefanie Edler and Georg Oberdorfer
[Studies in Language Variation 27] 2021
► pp. 253–278
Testing models of diffusion of morphosyntactic innovations in Twitter data
Deepthi Gopal | University of Cambridge
Tamsin Blaxter | University of Cambridge
David Willis | University of Oxford
Adrian Leemann | University of Bern
Established models of the spatial diffusion of linguistic innovations vary in their relationship to population density. Differences in prediction between the gravity models (Trudgill 1974), in which probability of diffusion is sensitive to settlement size, and the traditional wave models can be challenging to test due to the difficulty of large-scale and finely-grained geographical sampling. This paper tests the suitability of data derived from Twitter in establishing diffusion patterns. Using two case studies from British English – variation in the realisation of ditransitives, and preposition drop with go – we propose that the correlation between (local) population density and linguistic similarity to geographical neighbours can be used as a measure of hierarchical patterning for an individual innovation.
Keywords: dialectology, syntactic variation, computational sociolinguistics, British English, dative alternation
Article outline
- 1.Introduction
- 2.Methodology and corpus construction
- 2.1Corpus structure
- 2.1.1Localisation
- 2.1Corpus structure
- 3.Mapping the distribution of morphosyntactic variants
- 3.1Dative alternation revisited
- 3.2Preposition drop
- 4.Approaches to the quantification of diffusion
- 4.1Measurement
- 4.1.1Simulated data
- 4.2Evaluating real data
- 4.1Measurement
- 5.Conclusions
-
Notes -
References
Available under the Creative Commons Attribution-NonCommercial-NoDerivatives (CC BY-NC-ND) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 16 December 2021
https://doi.org/10.1075/silv.27.11bla
https://doi.org/10.1075/silv.27.11bla
References
Bailey, Guy et al.
Bailey, Laura R.
Bamman, David, Jacob Eisenstein and Tyler Schnoebelen
Biggs, Alison
2014 Passive variation in the dialects of Northwest British English. Paper presented at the 3rd Conference of the International Society for the Linguistics of English (ISLE), University of Zürich, 24–27 August. https://www.isle-linguistics.org/assets/content/documents/hogg/Biggs--passive_variation--2014.pdf (4 February 2020)
Bresnan, Joan W. and Marilyn Ford
Burridge, James
Doyle, Gabriel
2014 Mapping dialectal variation by querying social media. In Shuly Wintner, Sharon Goldwater and Stefan Riezler (eds.), Proceedings of the 14th conference of the European chapter of the Association for Computational Linguistics, 98–106. Gothenburg: Association for Computational Linguistics. https://www.aclweb.org/anthology/E14-1011.pdf. 
Eisenstein, Jacob
Gast, Volker
Gerwin, Johanna
Gonçalves, Bruno and David Sánchez
Grieve, Jack et al.
Grieve, Jack, Andrea Nini and Diansheng Guo
Haddican, William
Haddican, William and Daniel E. Johnson
Hägerstrand, Torsten
Hall, David
Hecht, Brent and Monica Stephens
Huang, Yuan et al.
Jones, Taylor
Labov, William
Malik, Momin et al.
2015 Population bias in geotagged tweets. In Derek Ruths and Jürgen Pfeffer (eds.), Standards and Practices in Large-Scale Social Media Research: Papers from the 2015 ICWSM Workshop. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10662 (10 January 2020.)
Myler, Neil
Nguyen, Dong et al.
2014 Why gender and age prediction from tweets is hard. Lessons from a crowdsourcing experiment. In Junichi Tsujii and Jan Hajić (eds.), Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical papers, 1950–1961. Dublin: Dublin City University and Association for Computational Linguistics.
Office For National Statistics, Geography Division
2016 Index of place names in Great Britain (July 2016). https://www.ons.gov.uk/methodology/geography/geographicalproducts/otherproducts/indexofplacenamesipn
Olsson, Gunnar
Ordnance Survey Ireland
2016 Townlands – OSi national placenames gazetteer. https://data-osi.opendata.arcgis.com/datasets/townlands-osi-national-placenames-gazetteer (10 January 2020)
Orton, Harold, Stewart Sanderson and John D. A. Widdowson
Pavalanathan, Umashanthi and Jacob Eisenstein
2015 Confounds and consequences in geotagged Twitter data. In Lluís Màrquez, Chris Callison-Burch, Jian Su (eds.), Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2138–2148. Stroudsburg, PA: Association for Computational Linguistics.
. https://www.aclweb.org/anthology/D15-1256
Reis, Stefan. et al.
Russ, Brice
2012 Examining large-scale regional variation through online geotagged corpora. Paper presented at the Annual Meeting of the American Dialect Society, Portland. http://www.briceruss.com/ADStalk.pdf
Schmidt, Johannes
Shoemark, Philippa et al.
2017 Aye or naw, whit dae ye hink? Scottish independence and linguistic identity on social media. In Mirella Lapata, Phil Blunsom and Alexander Koller (eds.), Proceedings of the 15th conference of the European chapter of the association for computational linguistics, vol. 1, long papers. Stroudsburg, PA: Association for Computational Linguistics. 

Siewierska, Anna and Willem B. Hollmann
Stevenson, Jonathan
Strelluf, Christopher
Szmrecsanyi, Benedikt
Trudgill, Peter
Upton, Clive and John D. A. Widdowson
Wikle, Thomas and Guy Bailey
Willis, David
Wolk, Christoph et al.