Chapter published in:Applications of Pattern-driven Methods in Corpus Linguistics
Edited by Joanna Kopaczyk and Jukka Tyrkkö
[Studies in Corpus Linguistics 82] 2018
► pp. 81–104
Chapter 4Lexical obsolescence and loss in English: 1700–2000
This paper explores a new methodology for extracting forms that were once common but are now obsolete, from large corpora. It proceeds from the relatively under-researched problem of lexical mortality, or obsolescence in general, to the formulation of two closely related procedures for querying the n-gram data of the Google Books project in order to identify the best word and lexical expression candidates that may have become lost or obsolete in the course of the last three centuries, from the Late Modern era to Present-day English (1700–2000). After describing the techniques used to process big uni- and trigram data, this chapter offers a selective analysis of the results and proposes ways the methodology may be of help to corpus linguists as well as historical lexicographers.
Keywords: lexicology, corpus linguistics, diachronic linguistics, obsolescence, n-grams, lexical bundles, Late Modern English, Google Books
- 1.1Research questions
- 1.2Theoretical problems and practical definitions
- 2.The corpus and its problems
- 2.1The n-grams
- 3.1Data requirements
- 3.2Word obsolescence
- 3.3Pruning and sorting the results
- a.Proper names
- b.OCR errors
- c.Variety-specific forms
- 3.4Obsolescence of multi-word expressions
- 4.Analysis and discussion of the results
- 4.3Future research
Published online: 13 March 2018
1990 The assessment of lexical mortality and replacement between old and modern English. In Papers from the 5th International Conference on English Historical Linguistics [Current Issues in Linguistic Theory 65], Sylvia M. Adamson, Vivien A. Law, Nigel Vincent & Susan Wright (eds), 69–86. Amsterdam: John Benjamins.
2012 Google Books corpus. Google Books Corpus. http://googlebooks.byu.edu/> (1 February 2016).
Google Books History
2009 <https://books.google.com/intl/en/googlebooks/about/history.html> (29 November 2015).
Gries, Stefan T.
Hales, Steven D.
2005 Thinking tools: You can prove a negative. Think 4(10): 109–112. . https://philpapers.org/rec/HALTTY> (30 January 2016).
Michel, Jean-Baptiste, Shen, Yuan Kui, Aiden, Aviva Presser, Veres, Adrian, Gray, Matthew K., Pickett, Joseph P., Hoiberg, Dale et al.
2011 Quantitative analysis of culture using millions of digitized books. Science 331(6014): 176–182. . http://science.sciencemag.org/content/331/6014/176> (29 November 2015)
Milton, James & Donzelli, Giovanna
OED Online: Key to frequency
2015 OED Online. http://public.oed.com/how-to-use-the-oed/key-to-frequency/> (4 February 2016)
Petersen, Alexander M., Tenenbaum, Joel, Havlin, Shlomo & Stanley, H. Eugene
2012 Statistical laws governing fluctuations in word use from word birth to word death. Scientific Reports 2. . https://arxiv.org/abs/1107.3707> (29 November 2015).
http://testyourvocab.com (29 November 2015).
Cited by 9 other publications
Cunha, Evandro L.T.P. & Søren Wichmann
Francis, David, Ella Rabinovich, Farhan Samir, David Mortensen & Suzanne Stevenson
Kranich, Svenja & Tine Breban
Säily, Tanja & Jukka Tyrkkö
This list is based on CrossRef data as of 31 march 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.