Text Categories and Where You Can Stick Them
A Crude Formality Index
This paper applies principal components analysis (PCA) to solve the problem of interpreting pre-existing corpus text categories for analysis of linguistic variation. The method is illustrated by constructing an index of the complex notion "formality " from PCA of a set of high-frequency wordform-based counts. The first principal component from this analysis acts as a broad formality index; a second principal component is tentatively identified as marking "concrete facts" versus "abstract discussion"'. Subsequently, text categories from the corpora are positioned on these textual dimensions, and selected categories are evaluated for internal consistency by comparing the distribution of texts across subcategories. Finally, suggestions are made concerning further developments and applications of the method used here, and its implications for the use of corpora in variation studies.
Cited by
Cited by 12 other publications
Bauer, Laurie
2008.
Inferring Variation and Change from Public Corpora. In
The Handbook of Language Variation and Change,
► pp. 97 ff.

Dash, Niladri Sekhar & L. Ramamoorthy
2019.
Process of Text Corpus Generation. In
Utility and Application of Language Corpora,
► pp. 17 ff.

Ferrero, Paz, Rachel Whittaker & Javier Alda
2013.
“Evaluator”. In
Technologies for Inclusive Education [
Advances in Educational Technologies and Instructional Design, ],
► pp. 244 ff.

2014.
“Evaluator”. In
Computational Linguistics,
► pp. 1601 ff.

Kruger, Haidee & Bertus van Rooy
Kruger, Haidee & Adam Smith
2018.
Colloquialization versus Densification in Australian English: A Multidimensional Analysis of the Australian Diachronic Hansard Corpus (ADHC).
Australian Journal of Linguistics 38:3
► pp. 293 ff.

Li, Haiying, Arthur C. Graesser, Mark Conley, Zhiqiang Cai, Philip I. Pavlik & James W. Pennebaker
2016.
A New Measure of Text Formality: An Analysis of Discourse of Mao Zedong.
Discourse Processes 53:3
► pp. 205 ff.

Paiva, Daniel S. & Roger Evans
2004.
A Framework for Stylistically Controlled Generation. In
Natural Language Generation [
Lecture Notes in Computer Science, 3123],
► pp. 120 ff.

Paolillo, John C., Jonathan Warren & Breanne Kunz
2007.
2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07),
► pp. 70 ff.

Pavlick, Ellie & Joel Tetreault
2016.
An Empirical Analysis of Formality in Online Communication.
Transactions of the Association for Computational Linguistics 4
► pp. 61 ff.

Sigley, R.
2006.
Corpora in Studies of Variation. In
Encyclopedia of Language & Linguistics,
► pp. 220 ff.

[no author supplied]
2000.
References. In
Emerging English Modals,

This list is based on CrossRef data as of 21 march 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.