Corpus driven identification of lexical bundle obsolescence in Late Modern English

Tichý, Ondřej

doi:10.1075/slcs.218.04tic

Part of

Lost in Change: Causes and processes in the loss of grammatical elements and constructions
Edited by Svenja Kranich and Tine Breban
[Studies in Language Companion Series 218] 2021
► pp. 101–130

Corpus driven identification of lexical bundle obsolescence in Late Modern English

editor

Ondřej Tichý | Charles University

This chapter explores a new methodology for extracting multi-word units that were once common but have since become obsolete from large corpora (esp. from the Google ngrams dataset of the Google Books project). It complements a modified frequency-based methodology previously used for detecting lexical obsolescence (Tichý 2018) with a bottom up approach to calculating association measures in multi-word sequences inspired by Wahl & Gries (2019). The analytical part examines expressions identified as potentially obsolete on their way from Late Modern to Present-day English. Conditions, circumstances and consequences of the loss of such expressions are considered with a focus on the competing forms expressing similar functions that may be recognized as supplanting the old forms.

Keywords: lexicology, corpus linguistics, diachronic linguistics, obsolescence, ngrams, lexical bundles, multi-word expressions, Late Modern English, Google Books

Article outline

1.Introduction
2.Material
3.Methodology
- 3.1Thresholds
- 3.2Selection
4.Technical aspects
5.Analysis
- 5.1Trash
- 5.2Results
  - 5.2.1Terminology
  - 5.2.2“Quasi” terminology
  - 5.2.3Appellations
  - 5.2.4Legal/administrative phrases
  - 5.2.5Dating
  - 5.2.6Pragmatic markers
  - 5.2.7Replacement in collocations
  - 5.2.8Countability and accommodation
  - 5.2.9Complex verb phrase
6.Discussion
7.Conclusions
Acknowledgements
Notes
References
Appendix

Published online: 16 June 2021

https://doi.org/10.1075/slcs.218.04tic

References

Aitchison, Jean

2012 Words in the Mind: An Introduction to the Mental Lexicon. Oxford: Wiley-Blackwell.

Biber, Douglas, Johansson, Stig, Leech, Geoffrey, Conrad, Susan & Finegan, Edward

2000 Longman Grammar of Spoken and Written English. London: Longman.

Coleman, Robert

1990 The assessment of lexical mortality and replacement between Old and Modern English. In Papers from the 5th International Conference on English Historical Linguistics [Current Issues in Linguistic Theory 65], Sylvia M. Adamson, Vivien A. Law, Nigel Vincent & Susan Wright (eds), 69–86. Amsterdam: John Benjamins.

.

Cvrček, Václav

Corpus Confidence Calculator. < [URL]> (27 April 2019).

Denison, David

1998 Syntax. In The Cambridge History of the English Language, Vol. 4: 1776–1997, Suzanne Romaine (ed.). Cambridge: CUP.

Farradne, J., Poulton, R.K & Datta, M. S.

1965 Problems in analysis and terminology for information retrieval. Journal of Documentation 21(4): 287–90.

.

Iyeiri, Yoko

2018 Causative make and its infinitival complements in Early Modern English. In Explorations in English Historical Syntax [Studies in Langage Companion Series 198], Hubert Cuyckens, Hendrik De Smet, Liesbet Heyvaert & Charlotte Maekelberghe (eds), 139–58. Amsterdam: John Benjamins.

.

Kilgarriff, Adam

2015 How many words are there? In The Oxford Handbook of the Word, John R. Taylor (ed.), 29–37. Oxford: OUP.

Maixner, Vítězslav

1970 Zánik Slov v Nové Angličtině.

Michel, Jean-Baptiste, Kui Shen, Yuan, Presser Aiden, Aviva, Veres, Adrian, Gray, Matthew K

., The Google Books Google Books Team, Pickett, Joseph P. et al. 2011 Quantitative analysis of culture using millions of digitized books. Science 331(6014): 176–182.

.

Milton, James & Donzelli, Giovanna

2013 The lexicon. In The Cambridge Handbook of Second Language Acquisition, Julia Herschensohn & Martha Young-Scholten (eds), 441–60. Cambridge: CUP.

Němec, Igor

1968 Strukturní předpoklady zániku slov. Slovo a Slovesnost 29(2): 152–58. <[URL]> (5 November 2020).

Oxford English Dictionary

n.d. Key to frequency. Oxford: OUP. [URL] (22 April 2019).

Petersen, Alexander M., Tenenbaum, Joel, Havlin, Shlomo & Stanley, H. Eugene

2012 Statistical laws governing fluctuations in word use from word birth to word death. Scientific Reports 2 (March): 313.

.

Rudnicka, Karolina

2019 The statistics of obsolescence: Purpose subordinators in Late Modern English. Basel: NIHIN.

Rychlý, Pavel

2008 A lexicographer-friendly association score. RASLAN 2008, 6–9. Brno: Masarykova Univerzita.

The British National Corpus, Version 2 (BNC World)

2001 Praha: Distributed by Oxford University Computing Services on behalf of the BNC Consortium. Ústav Českého národního korpusu FF UK. <[URL]>

Tichý, Ondřej

2018 Lexical obsolescence and loss in English: 1700–2000. In Applications of Pattern-Driven Methods in Corpus Linguistics [Studies in Corpus Linguistics 82], Joanna Kopaczyk & Jukka Tyrkkö (eds), 81–103. Amsterdam: John Benjamins.

.

Trench, Richard Chenevix

1871 English. Past and Present., New York, NY: Charles Scribner and Co.

Wahl, Alexander & Gries, Stefan T

2019 Computational extraction of formulaic sequences from corpora: Two case studies of a new extraction algorithm. In Computational Phraseology [IVITRA Research in Linguistics and Literature 24], Gloria Corpas Pastor & Jean-/Pierre Colson (eds), 84–110. Amsterdam: John Benjamins.