Topological mapping for visualisation of high-dimensional historical linguistic data
This paper addresses an issue in visualization of high-dimensional data abstracted from historical corpora whose importance in quantitative and corpus linguistics has thus far not been sufficiently appreciated: the possibility that the data is nonlinear. Most applications of data visualization in these fields use linear proximity measures which ignore nonlinearity, and, if the data is significantly nonlinear, can give misleading results. Topological mapping is a nonlinear visualization method, and its application via a particular topological mapping method, the Self-Organizing Map, is here exemplified with reference to a small historical text corpus.
Article outline
- 1.Introduction
- 2.Nonlinearity
- 2.1Nonlinearity in natural processes
- 2.2Nonlinearity in data
- 2.3Nonlinearity in linguistic data
- 3.The problem
- 4.Topological mapping
- 4.1Topology
- 4.2Projection of topological structure into low-dimensional space
- 4.3Preservation of nonlinearity
- 4.4Example
- 4.4.1The text collection
- 4.4.2Spelling data
- 4.4.3The Self-Organizing Map
- 4.4.4Result
- 5.Conclusion
-
References