Chapter published in:
Corpus Pragmatic Studies on the History of Medical DiscourseEdited by Turo Hiltunen and Irma Taavitsainen
[Pragmatics & Beyond New Series 330] 2022
► pp. 49–78
Chapter 3Medical topics and style from 1500 to 2018
A corpus-driven exploration
Gerold Schneider | University of Zurich
This chapter investigates changes in medical topics, style and language across 500 years, from 1500 to 2018. To do so, we employ data-driven methods of Computational Linguistics and Digital Humanities: document classification, topic modelling, and automatically constructed conceptual maps. We trace changes from traditional thinking in the scholastic period to empirical methods, professionalised medicine, and finally the increasing importance of data, statistics and clinical studies, away from symptom-centred medicine. We conclude that medical discourse has undergone radical changes and that data-driven methods reflect these changes and offer an unprecedented overview. We also critically discuss shortcomings of our data and methods.
Keywords: data-driven approaches, machine learning, collocations, Topic Modelling, history of medicine, Digital Humanities, conceptual maps, Kernel Density Estimation, automated content analysis, English medical discourse, language and health, culturomics
Article outline
- 1.Introduction
- 2.Motivation
- 2.1Systematic comparison of all lexical features
- 2.2Advanced computational methods
- 2.3Sampling and representativeness
- 3.Materials
- 3.1CEEM
- 3.2ARCHER Medical
- 3.3HIMERA
- 3.4PubMed Excerpt
- 3.5Overview of the complete data of our investigation
- 3.6Limitations of the data
- 4.Methods
- 4.1Data preparation
- 4.2Supervised document classification
- 4.3Unsupervised topic modelling
- 4.4Unsupervised Conceptual Maps with Kernel Density Estimation
- 5.Results
- 5.1Results of supervised document classification
- 5.2Results of unsupervised topic modelling
- 5.3Results of Unsupervised Conceptual Maps with Kernel Density Estimation
- 6.Conclusion and future prospects
-
Acknowledgements -
Notes -
References
Published online: 01 July 2022
https://doi.org/10.1075/pbns.330.03sch
https://doi.org/10.1075/pbns.330.03sch
References
Ananiadou, Sophia, Douglas B. Kell, and Tsujii, Jun-ichi
Baron, Alistair, and Paul Rayson
2008 “VARD 2: A Tool for Dealing with Spelling Variation in Historical Corpora.” In Proceedings of the Postgraduate Conference in Corpus Linguistics, Aston University, Birmingham, UK 22 May 2008 http://eprints.lancs.ac.uk/41666/1/BaronRaysonAston2008.pdf
Baroni, Marco, and Alessandro Lenci
Biber, Douglas, Edward Finegan, and Dwight Atkinson
1994 “ARCHER and Its Challenges: Compiling and Exploring a Representative Corpus of Historical English Registers.” Creating and Using English Language Corpora: Papers from the 14th International Conference on English Language Research on Computerized Corpora, Zürich 1994, ed. by Udo Fries, Peter Schneider, and Gunnel Tottie, 1–13. Amsterdam: Rodopi.
Broersma, Marcel, and Frank Harbers
Bybee, Joan
Church, Kenneth
Conklin, Kathy, and Norbert Schmitt
Erman, Britt and Beatrice Warren
Firth, John Rupert
Fitzmaurice, Susan, Justyna A. Robinson, Marc Alexander, Iona C. Hine, Seth Mehl, and Fraser Dallachy
Funk, Christopher
2015 “Concept Recognition and Its Application for Protein Function Prediction.” Computational Biology Thesis Defense. University of Colorado. https://www.slideshare.net/csfunk/funk-defense-upload
Ghanem, Salma
Grimmer, Justin, and Brandon Stewart
Hilpert, Martin, and Stefan Gries
Hundt, Marianne, David Denison, and Gerold Schneider
Janda, Laura A.
Jurafsky, Daniel, and James H. Martin
Keller, Frank and Mirella Lapata
Lapata, Mirella and Frank Keller
Late Modern English Medical Texts 1700–1800 (LMEMT)
Leech, Geoffrey
Michel, Jean-Baptiste, Shen, Yuan Kui, Aiden, Aviva P., Veres, Adrian, Gray, Matthew K., Pickett, Joseph P., Hoiberg, Dale, Clancy, Dan, Norvig, Peter, Orwant, Jon, Pinker, Steven, Nowak, Martin A. & Aiden, Erez Lieberman
Oakes, Michael P.
Roberts, Marilyn, Tzong-Horng (Dustin) Dzwo, and Wayne Wanta
Röder, Michael, Andreas Both, and Alexander Hinneburg
Sahlgren, Magnus
2006 The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations between Words in High- Dimensional Vector Spaces. PhD dissertation, Stockholm University.
Scally, Gabriel
Schneider, Gerold
2018 “Differences between Swiss High German and German High German via Data-Driven Methods.” Proceedings of the 3rd Swiss Text Analytics Conference (SwissText 2018), Winterthur, Switzerland, ed. by Mark Ciliebak, Don Tuggener and Fernando Benites, 17–25. http://ceur-ws.org/Vol-2226/
Schneider, Gerold, Eva Pettersson, and Michael Percillier
Schreiber-Gregory, Deanna
2018 “Regulation Techniques for Multicollinearity: Lasso, Ridge, and Elastic Nets.”
Proceedings of Western Users of SAS Software Conferences 2018
, September 5–7, 2018, Sacramento, California. https://www.lexjansen.com/wuss/2018/131_Final_Paper_PDF.pdf
Schwartz, H. Andrew, and Lyle H. Ungar
Sinclair, John and Ronald Carter
Steinberger, Ralf, Aldo Podavini, Alexandra Balahur, Guillaume Jacquet, Hristo Tanev, Jens Linge, Martin Atkinson, Michele Chinosi, Vanni Zavarella, Yaniv Steiner, and Erik van der Goot
2015 “Observing Trends in Automated Multilingual Media Analysis.” Proceedings of the Symposium on New Frontiers of Automated Content Analysis in the Social Sciences (ACA’2015), Zürich, Switzerland 1–3 July, 1–8. https://arxiv.org/abs/1603.02604
Taavitsainen, Irma, Turo Hiltunen, Anu Lehto, Ville Marttila, Päivi Pahta, Maura Ratia, Carla Suhr and Jukka Tyrkkö
Taavitsainen, Irma, Päivi Pahta, Turo Hiltunen, Martti Mäkinen, Ville Marttila, Maura Ratia, Carla Suhr, and Jukka Tyrkkö
Taavitsainen, Irma, and Gerold Schneider
Taavitsainen, Irma, Gerold Schneider, and Peter Jones
Tang, Jian, Zhaoshi Meng, Xuanlong Nguyen, Qiaozhu Mei, and Ming Zhang
2014 “Understanding the Limiting Factors of Topic Modeling via Posterior Contraction Analysis.” Proceedings of the 31st International Conference on Machine Learning, 32(1), ed. by Eric P. Xing, and Tony Jebara, 190–198. http://proceedings.mlr.press/v32/tang14.html
Thompson, Paul, Riza Theresa Batista-Navarro, Georgios Kontonatsios, Jacob Carter, Elizabeth Toon, John McNaught, Carsten Timmermann, Michael Worboys, and Sophia Ananiadou
Villegas, Marta, Ander Intxaurrondo, Aitor Gonzalez-Agirre, Montserrat Marimon, and Martin Krallinger
2018 “The MeSpEN Resource for English-Spanish Medical Machine Translation and Terminologies: Census of Parallel Corpora, Glossaries and Term Translations.” In LREC MultilingualBIO: Multilingual Biomedical Text Processing, Miyazaki, Japan, ed. by Maite Melero, Martin Krallinger and Aitor Gonzalez-Agirre, 32–39, ELRA. http://lrec-conf.org/workshops/lrec2018/W3/pdf/book_of_proceedings.pdf