Chapter published in:
Language and Text: Data, models, information and applicationsEdited by Adam Pawłowski, Jan Mačutek, Sheila Embleton and George Mikros
[Current Issues in Linguistic Theory 356] 2021
► pp. 209–224
Topological mapping for visualisation of high-dimensional historical linguistic data
Hermann Moisl | Newcastle University
This paper addresses an issue in visualization of high-dimensional data abstracted from historical corpora whose importance in quantitative and corpus linguistics has thus far not been sufficiently appreciated: the possibility that the data is nonlinear. Most applications of data visualization in these fields use linear proximity measures which ignore nonlinearity, and, if the data is significantly nonlinear, can give misleading results. Topological mapping is a nonlinear visualization method, and its application via a particular topological mapping method, the Self-Organizing Map, is here exemplified with reference to a small historical text corpus.
Keywords: Historical linguistics, nonlinearity, high-dimensional data, topological mapping, clustering
Article outline
- 1.Introduction
- 2.Nonlinearity
- 2.1Nonlinearity in natural processes
- 2.2Nonlinearity in data
- 2.3Nonlinearity in linguistic data
- 3.The problem
- 4.Topological mapping
- 4.1Topology
- 4.2Projection of topological structure into low-dimensional space
- 4.3Preservation of nonlinearity
- 4.4Example
- 4.4.1The text collection
- 4.4.2Spelling data
- 4.4.3The Self-Organizing Map
- 4.4.4Result
- 5.Conclusion
-
References
Published online: 22 December 2021
https://doi.org/10.1075/cilt.356.14moi
https://doi.org/10.1075/cilt.356.14moi
References
Allinson, Nigel, Hujun Yin, Lesley Allinson & Jon Slack
Bertuglia, Cristoforo & Franco Vaio
Haykin, Simon
Hubel, David & Torsten Wiesel
Izenman, Alan
Kaski, Samuel
Lay, David
Ritter, Helge, Thomas Martinetz & Klaus Schulten
Strogatz, Steven
Sutherland, Wilson
Ultsch, Alfred
Ultsch, Alfred & Peter Siemon
Van Hulle, Marc
Verleysen, Michel
Vesanto, Juha & Esa Alhoniemi