Uralic typology in the light of a new comprehensive dataset
This paper presents the Uralic Areal Typology Online (UraTyp 1.0), a typological dataset of 35 Uralic languages and a total of 360 features, mainly covering the levels of morphology, syntax, and phonology. The features belong to two different datasets: 195 features’ definitions originate from the Grambank (GB) database, developed for comparison of world language typology, whereas 165 features (UT) have been designed specifically to describe the typological variation within the Uralic language family. We present a series of analyses of the dataset demonstrating its scope and possibilities. The complete data set correctly identifies the main Uralic subgroups in a Principal Components Analysis, whereas GB data alone is insufficiently granular to detect this family-internal structure. Similar analyses limited to various typological subdomains also give variable results. A model-based admixture analysis identifies four distinct areas of historical interaction: Saami, Finnic, the Volga area and Ob-Ugric.
- 2.UraTyp & Uralic languages in Grambank
- 2.1Previous systematic documentation of Uralic typological diversity
- 2.2Creating the UraTyp database
- 2.2.1Uralic languages in Grambank
- 2.2.2Defining the UT features
- 2.2.3Coding the Uralic languages
- 2.2.4Combining GB and UT data into UraTyp
- 2.3Data availability
- 3.Statistical analyses of the UraTyp data
- 3.1Overview of the variation in UraTyp data
- 3.2Clustering UraTyp and its subsets with PCA
- 3.3What distinguishes Uralic subfamilies?
- 3.4Diachronic patterns of typological admixture
- 5.Conclusions and future perspectives
For any use beyond this license, please contact the publisher at email@example.com.