Chapter published in:Corpora and the Changing Society: Studies in the evolution of English
Edited by Paula Rautionaho, Arja Nurmi and Juhani Klemola
[Studies in Corpus Linguistics 96] 2020
► pp. 29–56
Changes in society and language
This study addresses how societal and linguistic changes can be detected using historical corpora, with the topics of poverty and industrial revolution as a case study, based on large historical corpora, in particular EEBO, and CLMET3.0. The results, based on a rich array of state-of-the art statistical approaches (such as kernel density estimation), show how poverty, industrial revolution, and urbanization are associated through, for instance, the associations of war, religion, family, poverty, and suffering. The study also discusses the importance of data size and cleanness, the temptations of distant reading, and the necessity for validating the discovered patterns in close reading and distant reading in interaction.
- 2.Data and pre-processing
- 2.1The EEBO Collection as sampler corpus
- 2.2The CLMET3.0 corpus
- 2.3The pre-processing step of spelling normalization
- 3.1Data-based and data-driven approaches
- 3.2Document classification
- 3.3Topic modelling
- 3.4Conceptual maps
- 4.Results and discussion
- 4.1Dictionary-based approach
- 4.2Topic modelling
- 4.2.1EEBO early vs. EEBO late
- 4.2.2Adding CLMET3.0 and increasing the number of topics
- 4.3Conceptual maps
Published online: 08 April 2020
Corpora and software
CLMET3.0 = Corpus of Late Modern English Texts
version 3.0. De Smet, Hendrik, Diller, Hans-Jürgen & Tyrkkö, Jukka (comps). https://perswww.kuleuven.be/~u0044428/
EEBO = Early English Books Online. Davies, Mark
Mallet = Machine Learning for LanguagE Toolkit
Ananiadou, Sophia, Kell, Douglas B. & Tsujii, Jun-ichi
Baroni, Marco & Lenci, Alessandro
Bartsch, Sabine & Evert, Stefan
2013 Did living standards improve during the Industrial Revolution? The Economist, September 13 2013 <https://www.economist.com/free-exchange/2013/09/13/did-living-standards-improve-during-the-industrial-revolution> (30 December 2018).
Daudin, Guillaume, O’Rourke, Kevin H., & Prados de la Escosura, Leandro
2008 Trade and empire, 1700–1870. Technical Report # 2008–24, OFCE: Centre de recherche en économie et sciences po. https://www.ofce.sciences-po.fr/pdf/dtravail/WP2008-24.pdf> (30 December 2018).
Food and Agriculture Organisation of the United Nations
. November 2003 Anti-hunger Programme. A twin-track approach to hunger reduction: priorities for national and international action. http://www.fao.org/3/J0563E/j0563e02.htm.
Gries, Stefan T.
Grimmer, Justin & Stewart, Brandon
Hatton, Timothy J. & Bray, Bernice E.
Hilpert, Martin & Gries, Stefan T.
Jurafsky, Daniel & Martin, James H.
Michel, Jean-Baptiste, Shen, Yuan Kui, Presser Aiden, Aviva, Veres, Adrian, Gray, Matthew K., The Google Books Team, Pickett, Joseph P., Hoiberg, Dale, Clancy, Dan, Norvig, Peter, Orwant, Jon, Pinker, Steven, Nowak, Martin A. & Lieberman Aiden, Erez
Oakes, Michael P.
2018 Differences between Swiss High German and German High German via data-driven methods. In Proceedings of SwissText 2018, Mark Cieliebak, Don Tuggener & Fernando Benites (eds), 6–16. http://ceur-ws.org/Vol-2226/> (30 December 2018).
Schneider, Gerold, Pettersson, Eva & Percillier, Michael
2017 Comparing rule-based and SMT-based spelling normalisation for English historical texts. Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language. http://www.ep.liu.se/ecp/133/008/ecp17133008.pdf> (30 December 2018).
Schwartz, H. Andrew & Ungar, Lyle H.
Szreter, Simon & Mooney, Graham
Taavitsainen, Irma & Schneider, Gerold
Wüest, Bruno, Schneider, Gerold & Amsler, Michael