A lectometric analysis of aggregated lexical variation in written Standard English with Semantic Vector Space models
Lectometry is a corpus-based methodology that explores how multiple language-external dimensions shape language usage in an aggregate perspective. The paper combines this methodology with Semantic Vector Space modeling to investigate lexical variability in written Standard English, as sampled in the original Brown family of corpora (Brown, LOB, Frown and F-LOB). Based on a joint analysis of 303 lexical variables, which are semi-automatically extracted by means of a SVS, we find that lexical variation in the Brown family is systematically related to three lectal dimensions: discourse type (informative versus imaginative), standard variety (British English versus American English), and time period (1960s versus 1990s). It turns out that most lexical variables are sensitive to at least one of these three language-external dimensions, yet not every dimension has dedicated lexical variables: in particular, distinctive lexical variables for the real time dimension fail to emerge.
Published online: 31 March 2016
Borin, L., & Saxena, A
Church, K.W., & Hanks, P
de Leeuw, J., & Mair, P
Delaere, I., De Sutter, G., & Plevoets, K
Dinu, G., Thater, S., Laue, S
(2012) A comparison of models of word meaning in context. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 611–615). Montréal, Canada: Association for Computational Linguistics.
Geeraerts, D., Grondelaers, S., & Bakema, P
Geeraerts, D., Grondelaers, S., & Speelman, D
Grieve, J., Speelman, D., & Geeraerts, D
(2004) Measuring Dialect Pronunciation Differences using Levenshtein Distance. (Unpublished doctoral dissertation). Groningen, Netherlands: Rijksuniversiteit Groningen.
Heylen, K., & Ruette, T
Heylen, K., Speelman, D., & Geeraerts, D
(2012) Looking at word meaning. An interactive visualization of Semantic Vector Spaces for Dutch synsets. Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH (pp. 16–26). Avignon, France: Association for Computational Linguistics.
Hinrichs, L., Smith, N., & Waibel, B
Labov, W., Ash, S., & Boberg, C
(2012) A quick tour of word sense disambiguation, induction and related approaches. In Proceedings of the 38th Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM) (pp. 115–129). Heidelberg, Germany: Springer-Verlag.
Nerbonne, J., & Kretzschmar, W
(2003) Clustering by committee. (Unpublished doctoral dissertation). Alberta, Canada: University of Alberta.
(2010) Crossing corpora. (Unpublished doctoral dissertation). Leuven, Belgium: University of Leuven.
Peirsman, Y., Geeraerts, D., & Speelman, D
Plevoets, K., Speelman, D., & Geeraerts, D
R Core Team
Reppen, R., Ide, N., & Suderman, K
(2012) Aggregating Lexical Variation: Towards large-scale lexical lectometry. (Unpublished doctoral dissertation). Leuven, Belgium: University of Leuven.
Ruette, T., Geeraerts, D., Peirsman, Y., & Speelman, D
Ruette, T., & Speelman, D
Schler, J., Koppel, M., Argamon, S., & Pennebaker, J
(2006) Effects of age and gender on blogging. In Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs . Palo Alto, California.
Speelman, D., Grondelaers, S., & Geeraerts, D
Takane, Y., Young, F., & de Leeuw, J
Turney, P., & Pantel, P
Wälchli, B., & Szmrecsanyi, B
Wieling, M., & Nerbonne, J
Wieling, M., Nerbonne, J., & Baayen, H
(1902) Die romanischen Namen der Körperteile: Eine onomasiologische Studie. (Unpublished doctoral dissertation). Erlangen, Germany: Universität Erlangen.
Cited by 3 other publications
Yao, Xinyue & Peter Collins
This list is based on CrossRef data as of 15 april 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.