Exploring part-of-speech frequencies in a sociohistorical corpus of English
We investigate the usefulness of part-of-speech (POS) annotation as a tool in the study of sociolinguistic variation and genre evolution. We analyse how POS ratios change over time in the Parsed Corpus of Early English Correspondence (c.1410–1681), which social groups lead the changes, and whether the changes can be connected to colloquialisation with regard to reduced complexity or an increasingly involved style. While we find gentry-led colloquialisation in terms of noun and verb frequencies as well as evidence for gendered styles, the results on structural complexity are more mixed. We argue that POS annotation can be a useful tool when complemented by a thorough textual analysis, but that more fine-grained categories are needed to reach firmer conclusions.
Article outline
- 1.Introduction
- 2.Background
- 2.1POS ratios in the study of (sociolinguistic) variation
- 2.2Complexity in the genre of personal correspondence
- 3.Material and method
- 3.1PCEEC and ReCEEC
- 3.2Visualisation
- 4.Analysis
- 4.1Complexity in the Parsed Corpus of Early English Correspondence
- 4.2Colloquialisation and gendered styles
- 5.Discussion and conclusion
-
Acknowledgements
-
Notes
-
References
-
Appendix
References
Argamon, Shlomo, Moshe Koppel, Jonathan Fine & Anat Rachel Shimoni
2003 Gender, genre, and writing style in formal written texts.
Text 23(3). 321–346.


Atzmueller, Martin
2015 Subgroup discovery.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(1). 35–49.


Bamman, David, Jacob Eisenstein & Tyler Schnoebelen
2014 Gender identity and lexical variation in social media.
Journal of Sociolinguistics 18(2). 135–160.


Bell, Allan
1984 Language style as audience design.
Language in Society 13(2). 145–204.


Biber, Douglas
1988 Variation across speech and writing. Cambridge: Cambridge University Press.


Biber, Douglas
1992 On the complexity of discourse complexity: A multidimensional analysis.
Discourse Processes 15(2). 133–163.


Biber, Douglas
1995 Dimensions of register variation. Cambridge: Cambridge University Press.


Biber, Douglas & Jena Burges
2000 Historical change in the language use of women and men: Gender differences in dramatic dialogue.
Journal of English Linguistics 28(1). 21–37.


Biber, Douglas & Susan Conrad
2009 Register, genre, and style (
Cambridge Textbooks in Linguistics). Cambridge: Cambridge University Press.


Biber, Douglas & Edward Finegan
1989 Drift and the evolution of English style: A history of three genres.
Language 65(3). 487–517.


Biber, Douglas & Edward Finegan
1997 Diachronic relations among speech-based and written registers in English. In
Terttu Nevalainen &
Leena Kahlas-Tarkka (eds.),
To explain the present: Studies in the changing English language in honour of Matti Rissanen (
Mémoires de la Société Néophilologique de Helsinki 52), 253–275. Helsinki: Société Néophilologique.

Biber, Douglas & Bethany Gray
2010 Being specific about historical change: The influence of sub-register.
Journal of English Linguistics 41(2). 104–134.


Biber, Douglas & Bethany Gray
Biber, Douglas, Bethany Gray & Shelley Staples
2016 Predicting patterns of grammatical complexity across language exam task types and proficiency levels.
Applied Linguistics 37(5). 639–668.


Carpenter, Bob, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li & Allen Riddell
2017 Stan: A probabilistic programming language.
Journal of Statistical Software 76(1).


Chafe, Wallace
1982 Integration and involvement in speaking, writing, and oral literature. In
Deborah Tannen (ed.),
Spoken and written language, 35–53. Norwood, NJ: Ablex.

Halliday, M. A. K. & Ruqaiya Hasan
1976 Cohesion in English. London & New York: Longman.

Heylighen, Francis & Jean-Marc Dewaele
2002 Variation in the contextuality of language: An empirical measure.
Foundations of Science 7(3). 293–340.


Hinneburg, Alexander, Heikki Mannila, Samuli Kaislaniemi, Terttu Nevalainen & Helena Raumolin-Brunberg
2007 How to handle small samples: Bootstrap and Bayesian methods in the analysis of linguistic change.
Literary and Linguistic Computing 22(2). 137–150.


Huddleston, Rodney & Geoffrey K. Pullum
(eds.) 2002 The Cambridge grammar of the English language. Cambridge: Cambridge University Press.


Hudson, Richard
1994 About 37% of word-tokens are nouns.
Language 70(2). 331–339.


Karlsson, Fred
2008 Complexity in linguistic theorizing.
The Mental Lexicon 9(2). 144–169.

Labov, William
1982 Building on empirical foundations. In
Winfred P. Lehmann &
Yakov Malkiel (eds.),
Perspectives on historical linguistics: Papers from a conference held at the meeting of the Language Theory Division, Modern Language Assn, San Francisco, 27–30 December 1979 (
Current Issues in Linguistic Theory 24), 17–92. Amsterdam: John Benjamins.


Labov, William
1990 The intersection of sex and social class in the course of linguistic change.
Language Variation and Change 2(2). 205–254.


Labov, William
1994 Principles of linguistic change, volume 1: Internal factors. Oxford: Blackwell.

Laslett, Peter
1965 The world we have lost. New York: Charles Scribner’s Sons.

Lehto, Anu
2015 The genre of Early Modern English statutes: Complexity in historical legal language (
Mémoires de la Société Néophilologique de Helsinki 97). Helsinki: Société Néophilologique.

Mair, Christian, Marianne Hundt, Geoffrey Leech & Nicholas Smith
Mäkelä, Eetu, Tanja Säily & Terttu Nevalainen
2016 Khepri – a modular view-based tool for exploring (historical sociolinguistic) data. In
Maciej Eder &
Jan Rybicki (eds.),
Digital Humanities 2016: Conference abstracts, 269–272. Kraków: Jagiellonian University & Pedagogical University.

Markus, Manfred
2001 The development of prose in Early Modern English in view of the gender question: Using grammatical idiosyncracies of 15th and 17th century letters.
European Journal of English Studies 5(2). 181–196.


Meurman-Solin, Anneli
2011 Utterance-initial connective elements in early Scottish epistolary prose. In
Anneli Meurman-Solin &
Ursula Lenker (eds.),
Connectives in synchrony and diachrony in European languages (
Studies in Variation, Contacts and Change in English 8). Helsinki: VARIENG.
[URL] (17 December, 2016.)

Nevala, Minna
2004 Address in early English correspondence: Its forms and socio-pragmatic functions (
Mémoires de la Société Néophilologique de Helsinki 64). Helsinki: Société Néophilologique.

Nevalainen, Terttu
2002 Language and woman’s place in earlier English.
Journal of English Linguistics 30(2). 181–199.


Nevalainen, Terttu & Helena Raumolin-Brunberg
2003 Historical sociolinguistics: Language change in Tudor and Stuart England (
Longman Linguistics Library). London: Pearson Education.

Newman, Matthew L., Carla J. Groom, Lori D. Handelman & James W. Pennebaker
2008 Gender differences in language use: An analysis of 14,000 text samples.
Discourse Processes 45(3). 211–236.


Palander-Collin, Minna
1999 Grammaticalization and social embedding: I THINK and METHINKS in Middle and Early Modern English (
Mémoires de la Société Néophilologique de Helsinki 55). Helsinki: Société Néophilologique.

Palander-Collin, Minna
2000 The language of husbands and wives in seventeenth-century correspondence. In
Christian Mair &
Marianne Hundt (eds.),
Corpus linguistics and linguistics theory. Papers from the twentieth International Conference on English Language Research on Computerized Corpora (ICAME 20), Freiburg im Breisgau 1999 (
Language and Computers: Studies in Practical Linguistics 33), 289–300. Amsterdam: Rodopi.

PCEEC =
Parsed Corpus of Early English Correspondence, tagged version
2006 Annotated by
Arja Nurmi,
Ann Taylor,
Anthony Warner,
Susan Pintzuk &
Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York & Helsinki: University of Helsinki. Distributed through the Oxford Text Archive.
[URL] (17 December, 2016.)

Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik
1985 A comprehensive grammar of the English language. London: Longman.

R Core Team
2016 R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
[URL] (17 December, 2016.)

Raumolin-Brunberg, Helena & Terttu Nevalainen
2007 Historical sociolinguistics: The Corpus of Early English Correspondence. In
Joan C. Beal,
Karen P. Corrigan &
Hermann L. Moisl (eds.),
Creating and digitizing language corpora, volume 2: Diachronic databases, 148–171. Houndsmills: Palgrave Macmillan.


Rayson, Paul, Geoffrey Leech & Mary Hodges
Rescher, Nicholas
1998 Complexity: A philosophical overview. New Brunswick, NJ: Transaction Publishers.

Säily, Tanja, Terttu Nevalainen & Harri Siirtola
2011 Variation in noun and pronoun frequencies in a sociohistorical corpus of English.
Literary and Linguistic Computing 26(2). 167–188.


Santorini, Beatrice
2016 Annotation manual for the Penn Historical Corpora and the York-Helsinki Corpus of Early English Correspondence.
[URL] (17 December, 2016.)

Schiffrin, Deborah
1987 Discourse markers. Cambridge: Cambridge University Press.


Siirtola, Harri, Poika Isokoski, Tanja Säily & Terttu Nevalainen
2016 Interactive text visualization with Text Variation Explorer. In
Ebad Banissi,
Mark W. McK. Bannatyne,
Fatma Bouali,
Remo Burkhard,
John Counsell,
Urska Cvek,
Martin J. Eppler,
Georges Grinstein,
Wei Dong Huang,
Sebastian Kernbach,
Chun-Cheng Lin,
Feng Lin,
Francis T. Marchese,
Chi Man Pun,
Muhammad Sarfraz,
Marjan Trutschl,
Anna Ursyn,
Gilles Venturini,
Theodor G. Wyeld &
Jian J. Zhang (eds.),
Proceedings of the 20th international conference on Information Visualisation (IV 2016), 330–335. Los Alamitos, California, CA: IEEE Computer Society.


Siirtola, Harri, Terttu Nevalainen, Tanja Säily & Kari-Jouko Räihä
2011 Visualisation of text corpora: A case study of the PCEEC. In
Terttu Nevalainen &
Susan M. Fitzmaurice (eds.),
How to deal with data: Problems and approaches to the investigation of the English language over time and space (
Studies in Variation, Contacts and Change in English 7). Helsinki: VARIENG.
[URL] (17 December, 2016.)

Siirtola, Harri, Tanja Säily, Terttu Nevalainen & Kari-Jouko Räihä
Tannen, Deborah
1991 You just don’t understand: Women and men in conversation. New York: Morrow and Company.

Taylor, Ann
2007 The York-Toronto-Helsinki Parsed Corpus of Old English Prose. In
Joan C. Beal,
Karen P. Corrigan &
Hermann L. Moisl (eds.),
Creating and digitizing language corpora, volume 2: Diachronic databases, 196–227. Houndsmills: Palgrave Macmillan.


Taylor, Ann & Beatrice Santorini
2006 The Parsed Corpus of Early English Correspondence. University of York.
[URL] (17 December, 2016.)
Vartiainen, Turo, Tanja Säily & Mikko Hakala
2013 Variation in pronoun frequencies in early English letters: Gender-based or relationship-based? In
Jukka Tyrkkö,
Olga Timofeeva &
Maria Salenius (eds.),
Ex philologia lux: Essays in honour of Leena Kahlas-Tarkka (
Mémoires de la Société Néophilologique de Helsinki 90), 233–255. Helsinki: Société Néophilologique.

Cited by
Cited by 3 other publications
Leiwo, Martti
2020.
L2 Greek in Roman Egypt: Intense language contact in Roman military forts.
Journal of Historical Sociolinguistics 6:2

Saario, Lassi, Tanja Säily, Samuli Kaislaniemi & Terttu Nevalainen
2021.
The burden of legacy: Producing the Tagged Corpus of Early English Correspondence Extension (TCEECE).
Research in Corpus Linguistics 9:1
► pp. 104 ff.

This list is based on CrossRef data as of 16 september 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.