Lectometry is a corpus-based methodology that explores how multiple language-external dimensions shape language usage in an aggregate perspective. The paper combines this methodology with Semantic Vector Space modeling to investigate lexical variability in written Standard English, as sampled in the original Brown family of corpora (Brown, LOB, Frown and F-LOB). Based on a joint analysis of 303 lexical variables, which are semi-automatically extracted by means of a SVS, we find that lexical variation in the Brown family is systematically related to three lectal dimensions: discourse type (informative versus imaginative), standard variety (British English versus American English), and time period (1960s versus 1990s). It turns out that most lexical variables are sensitive to at least one of these three language-external dimensions, yet not every dimension has dedicated lexical variables: in particular, distinctive lexical variables for the real time dimension fail to emerge.
(1988) Variation acros Speech and Writing. Cambridge, UK: Cambridge University Press.
Biber, D
(1989) A typology of English texts. Linguistics, 27(1), 3–42.
Bickel, B
(2007) Typology in the 21st century: Major current developments. Linguistic Typology, 11(1), 239–251.
BNC Consortium
(2007) The British National Corpus (version 3, BNC xml edition). Distributed by Oxford University in Computing Services on behalf of the BNC Consortium.
Borin, L., & Saxena, A
(2013) Approaches to Measuring Linguistic Differences. Berlin, Germany: Mouton de Gruyter.
Church, K.W., & Hanks, P
(1990) Word association, mutual information and lexicography. Computational Linguistics, 16(1), 22–29.
Cysouw, M
(2005) Quantitative methods in typology. In G. Altmann, R. Köhler, & R. Piotrowski (Eds.), Quantitative Linguistics: An International Handbook (pp. 554–578) Berlin, Germany: Mouton de Gruyter.
de Leeuw, J., & Mair, P
(2009) Multidimensional scaling using Majorization: SMACOF in R. Journal of Statistical Software, 31(3), 1–30.
(2012) A comparison of models of word meaning in context. In
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
(pp. 611–615). Montréal, Canada: Association for Computational Linguistics.
Dunning, T
(1993) Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
Firth, J
(1957) A synopsis of linguistic theory 1930-1955. In J.R. Firth (Ed.), Studies in Linguistic Analysis (pp. 1–32). Oxford, UK: Philological Society.
Geeraerts, D
(2010) Theories of Lexical Semantics. Oxford, UK: Oxford University Press.
Geeraerts, D., Grondelaers, S., & Bakema, P
(1994) The Structure of Lexical Variation. Meaning, Naming, and Context. Berlin, Germany: Mouton de Gruyter.
Geeraerts, D., Grondelaers, S., & Speelman, D
(1999) Convergentie en divergentie in de Nederlandse woordenschat. Een onderzoek naar kleding- en voetbaltermen. Amsterdam, Netherlands: Meertens Instituut.
Goebl, H
(1984) Dialektometrische Studien: Anhand italoromanischer, raetoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF. Tübingen, Germany: Max Niemeyer.
Grieve, J
(2007) Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing, 22(3), 251–270.
Grieve, J., Speelman, D., & Geeraerts, D
(2011) A statistical method for the identification and aggregation of regional linguistic variation. Language Variation and Change, 23(2), 193–221.
(2013) Degrees of semantic control in measuring aggregated lexical distances. In L. Borin & A. Saxena (Eds.), Approaches to Measuring Linguistic Differences (pp. 353–374). Berlin, Germany: Mouton de Gruyter.
Heylen, K., Speelman, D., & Geeraerts, D
(2012) Looking at word meaning. An interactive visualization of Semantic Vector Spaces for Dutch synsets.
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH
(pp. 16–26). Avignon, France: Association for Computational Linguistics.
Hinrichs, L., Smith, N., & Waibel, B
(2010) A manual of information for the part-of-speech-tagged ‘Brown’ corpora. ICAME Journal, 341, 189–230.
Horan, C
(1969) Multidimensional Scaling: Combining observations when individuals have different perceptual structures. Psychometrica, 34(2), 139–165.
Hudson, R
(1996) Sociolinguistics. Cambridge, UK: Cambridge University Press.
Labov, W
(1969) Contraction, deletion, and inherent variability of the English Copula. Language 45(4), 715–62.
(2006) The Atlas of North American English. Phonetics, Phonology and Sound Change. Berlin, Germany: Mouton de Gruyter.
Lavandera, B
(1978) Where does the sociolinguistic variable stop?Language in Society, 7(2), 171–183.
Navigli, R
(2012) A quick tour of word sense disambiguation, induction and related approaches. In
Proceedings of the 38th Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM)
(pp. 115–129). Heidelberg, Germany: Springer-Verlag.
Nerbonne, J
(2009) Data-driven dialectology. Language and Linguistics Compass 3(1), 175–198.
Nerbonne, J., & Kretzschmar, W
(2003) Introducing computational techniques in dialectometry. Computers and the Humanities, 37(3), 245–255.
Pantel, P
(2003) Clustering by committee. (Unpublished doctoral dissertation). Alberta, Canada: University of Alberta.
Peirsman, Y
(2010) Crossing corpora. (Unpublished doctoral dissertation). Leuven, Belgium: University of Leuven.
(2008) The distribution of T/V pronouns in Netherlandic and Belgian Dutch. In K. Schneider & A. Barron (Eds.), Variational Pragmatics: A Focus on Regional Varieties in Pluricentric Languages (pp. 181–210). Amsterdam, Netherlands: John Benjamins Publishing Company.
Quine, W.V.O
(1951) Two dogmas of empiricism. The Philosophical Review, 601, 20–43.
R Core Team
(2012) R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
Reppen, R., Ide, N., & Suderman, K
(2005) American National Corpus (ANC). Philadelphia, PA: Linguistic Data Consortium.
Ruette, T
(2012) Aggregating Lexical Variation: Towards large-scale lexical lectometry. (Unpublished doctoral dissertation). Leuven, Belgium: University of Leuven.
Ruette, T., Geeraerts, D., Peirsman, Y., & Speelman, D
(2014) Semantic weighting mechanisms in scalable lexical sociolectometry. In B. Szmrecsanyi & B. Wälchli (Eds.), Aggregating Dialectology and Typology: Linguistic Variation in Text and Speech, within and across Languages (205–230). Berlin, Germany: Mouton de Gruyter.
Ruette, T., & Speelman, D
(2014) Transparent aggregation of variables with individual differences scaling. Literary and Linguistic Computing, 29(1), 89–106.
Schler, J., Koppel, M., Argamon, S., & Pennebaker, J
(2006) Effects of age and gender on blogging. In
Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs
. Palo Alto, California.
Schneider, E
(1988) Qualitative vs. quantitative methods of area delimitation in dialectology: A comparison based on lexical data from Georgia and Alabama. Journal of English Linguistics 21(1), 175–212.
Seguy, J
(1971) La relation entre la distance spatiale et la distance lexicale. Revue de Linguistique Romane 351, 335–357.
Sinclair, J
(1991) Corpus, Concordance, Collocations. Oxford, UK: Oxford University Press.
Sinclair, J
(2004) Trust the Text: Language, Corpus and Discourse. London: Routledge.
Speelman, D., Grondelaers, S., & Geeraerts, D
(2003) Profile-based linguistic uniformity as a generic method for comparing language varieties. Computers and the Humanities, 371, 317–337.
(2011) Corpus-based dialectometry: A methodological sketch. Corpora, 6(1), 45–76.
Szmrecsanyi, B
(2013) Grammatical Variation in British English Dialects: A Study in Corpus-Based Dialectometry. Cambridge, UK: Cambridge University Press.
Takane, Y., Young, F., & de Leeuw, J
(1977) Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features. Psychometrika, 42(1), 7–67.
Turney, P., & Pantel, P
(2010) From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 371, 141–188.
Wälchli, B., & Szmrecsanyi, B
(2014) Introduction: The text-feature-aggregation pipeline in variation studies. In B. Szmrecsanyi & B. Wälchli (Eds), Aggregating Dialectology, Typology, and Register Analysis: Linguistic Variation in Text and Speech (1–25). Berlin, Germany: Mouton de Gruyter.
Wieling, M., & Nerbonne, J
(2011) Bipartite spectral graph partitioning for clustering dialect varieties and detecting their linguistic features. Computer Speech and Language, 25(3), 700–715.
Wieling, M., Nerbonne, J., & Baayen, H
(2011) Quantitative social dialectology: Explaining linguistic variation geographically and socially. PLoS ONE, 6(9), e23613.
Woolhiser, C
(2005) Political borders and dialect divergence/convergence in Europe. In P. Auer & F. Kerswill (Eds.), Dialect Change. Convergence and Divergence in European Languages (pp. 236–262). Cambridge, UK: Cambridge University Press.
Zauner, A
(1902) Die romanischen Namen der Körperteile: Eine onomasiologische Studie. (Unpublished doctoral dissertation). Erlangen, Germany: Universität Erlangen.
Cited by
Cited by 5 other publications
De Pascale, Stefano & Stefania Marzo
2023. Lexical coherence in contemporary Italian: a lectometric analysis. Sociolinguistica 37:1 ► pp. 145 ff.
Danae Perez, Marianne Hundt, Johannes Kabatek & Daniel Schreier
2021. English and Spanish,
Pijpops, Dirk
2022. Lectal contamination. International Journal of Corpus Linguistics 27:3 ► pp. 259 ff.
Szmrecsanyi, Benedikt
2021. Uncovering the Big Picture. In English and Spanish, ► pp. 184 ff.
Yao, Xinyue & Peter Collins
2019. Developments in Australian, British, and American English Grammar from 1931 to 2006: An Aggregate, Comparative Approach to Dialectal Variation and Change. Journal of English Linguistics 47:2 ► pp. 120 ff.
This list is based on CrossRef data as of 29 february 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.