Text Categories and Where You Can Stick Them
A Crude Formality Index
This paper applies principal components analysis (PCA) to solve the problem of interpreting pre-existing corpus text categories for analysis of linguistic variation. The method is illustrated by constructing an index of the complex notion "formality " from PCA of a set of high-frequency wordform-based counts. The first principal component from this analysis acts as a broad formality index; a second principal component is tentatively identified as marking "concrete facts" versus "abstract discussion"'. Subsequently, text categories from the corpora are positioned on these textual dimensions, and selected categories are evaluated for internal consistency by comparing the distribution of texts across subcategories. Finally, suggestions are made concerning further developments and applications of the method used here, and its implications for the use of corpora in variation studies.
Cited by 12 other publications
. Inferring Variation and Change from Public Corpora
. In The Handbook of Language Variation and Change
pp. 97 ff.
Dash, Niladri Sekhar & L. Ramamoorthy
. Process of Text Corpus Generation
. In Utility and Application of Language Corpora
pp. 17 ff.
Ferrero, Paz, Rachel Whittaker & Javier Alda
. In Technologies for Inclusive Education
[Advances in Educational Technologies and Instructional Design
, ], ►
pp. 244 ff.
. In Computational Linguistics
pp. 1601 ff.
Kruger, Haidee & Bertus van Rooy
Kruger, Haidee & Adam Smith
. Colloquialization versus Densification in Australian English: A Multidimensional Analysis of the Australian Diachronic Hansard Corpus (ADHC)
. Australian Journal of Linguistics
pp. 293 ff.
Li, Haiying, Arthur C. Graesser, Mark Conley, Zhiqiang Cai, Philip I. Pavlik & James W. Pennebaker
. A New Measure of Text Formality: An Analysis of Discourse of Mao Zedong
. Discourse Processes
pp. 205 ff.
Paiva, Daniel S. & Roger Evans
. A Framework for Stylistically Controlled Generation
. In Natural Language Generation
[Lecture Notes in Computer Science
, 3123], ►
pp. 120 ff.
Paolillo, John C., Jonathan Warren & Breanne Kunz
. 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07)
pp. 70 ff.
Pavlick, Ellie & Joel Tetreault
. An Empirical Analysis of Formality in Online Communication
. Transactions of the Association for Computational Linguistics
pp. 61 ff.
. Corpora in Studies of Variation
. In Encyclopedia of Language & Linguistics
pp. 220 ff.
[no author supplied]
. In Emerging English Modals
This list is based on CrossRef data as of 21 march 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.