This paper applies principal components analysis (PCA) to solve the problem of interpreting pre-existing corpus text categories for analysis of linguistic variation. The method is illustrated by constructing an index of the complex notion "formality " from PCA of a set of high-frequency wordform-based counts. The first principal component from this analysis acts as a broad formality index; a second principal component is tentatively identified as marking "concrete facts" versus "abstract discussion"'. Subsequently, text categories from the corpora are positioned on these textual dimensions, and selected categories are evaluated for internal consistency by comparing the distribution of texts across subcategories. Finally, suggestions are made concerning further developments and applications of the method used here, and its implications for the use of corpora in variation studies.
van Klyton, Aaron, Mary-Paz Arrieta-Paredes, Nicola Palladino & Ayush Soomaree
2023. Hegemonic practices in multistakeholder Internet governance: Participatory evangelism, quiet politics, and glorification of status quo at ICANN meetings. The Information Society 39:3 ► pp. 141 ff.
Dash, Niladri Sekhar & L. Ramamoorthy
2019. Process of Text Corpus Generation. In Utility and Application of Language Corpora, ► pp. 17 ff.
2018. Colloquialization versus Densification in Australian English: A Multidimensional Analysis of the Australian Diachronic Hansard Corpus (ADHC). Australian Journal of Linguistics 38:3 ► pp. 293 ff.
Li, Haiying, Arthur C. Graesser, Mark Conley, Zhiqiang Cai, Philip I. Pavlik & James W. Pennebaker
2016. A New Measure of Text Formality: An Analysis of Discourse of Mao Zedong. Discourse Processes 53:3 ► pp. 205 ff.
Pavlick, Ellie & Joel Tetreault
2016. An Empirical Analysis of Formality in Online Communication. Transactions of the Association for Computational Linguistics 4 ► pp. 61 ff.
Ferrero, Paz, Rachel Whittaker & Javier Alda
2013. “Evaluator”. In Technologies for Inclusive Education [Advances in Educational Technologies and Instructional Design, ], ► pp. 244 ff.
Ferrero, Paz, Rachel Whittaker & Javier Alda
2014. “Evaluator”. In Computational Linguistics, ► pp. 1601 ff.
Paolillo, John C., Jonathan Warren & Breanne Kunz
2007. 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07), ► pp. 70 ff.
Sigley, R.
2006. Corpora in Studies of Variation. In Encyclopedia of Language & Linguistics, ► pp. 220 ff.
Bauer, Laurie
2004. Inferring Variation and Change from Public Corpora. In The Handbook of Language Variation and Change, ► pp. 97 ff.
Paiva, Daniel S. & Roger Evans
2004. A Framework for Stylistically Controlled Generation. In Natural Language Generation [Lecture Notes in Computer Science, 3123], ► pp. 120 ff.
[no author supplied]
2000. References. In Emerging English Modals, ► pp. 299 ff.
This list is based on CrossRef data as of 17 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.