Uralic typology in the light of a new comprehensive dataset

Norvik, Miina; Jing, Yingqi; Dunn, Michael; Forkel, Robert; Honkola, Terhi; Klumpp, Gerson; Kowalik, Richard; Metslang, Helle; Pajusalu, Karl; Piha, Minerva; Saar, Eva; Saarinen, Sirkka; Vesakoski, Outi

doi:10.1075/jul.00002.nor

Article published In:

Journal of Uralic Linguistics
Vol. 1:1 (2022) ► pp.4–42

Uralic typology in the light of a new comprehensive dataset

Miina Norvik | Uppsala University , Sweden | University of Turku , Finland | University of Tartu , Estonia

Yingqi Jing | Uppsala University , Sweden | University of Turku , Finland

Michael Dunn | Uppsala University , Sweden

Robert Forkel | Max Planck Institute for Evolutionary Anthropology , Germany

Terhi Honkola | University of Turku , Finland | University of Tartu , Estonia

Gerson Klumpp | University of Tartu , Estonia

Richard Kowalik | University of Stockholm , Sweden

Helle Metslang | University of Tartu , Estonia

Karl Pajusalu | University of Tartu , Estonia

Minerva Piha | Uppsala University , Sweden | University of Turku , Finland | University of Oulu , Finland

Eva Saar | University of Tartu , Estonia

Sirkka Saarinen | University of Turku , Finland

Outi Vesakoski | University of Turku , Finland | University of Turku , Finland

This paper presents the Uralic Areal Typology Online (UraTyp 1.0), a typological dataset of 35 Uralic languages and a total of 360 features, mainly covering the levels of morphology, syntax, and phonology. The features belong to two different datasets: 195 features’ definitions originate from the Grambank (GB) database, developed for comparison of world language typology, whereas 165 features (UT) have been designed specifically to describe the typological variation within the Uralic language family. We present a series of analyses of the dataset demonstrating its scope and possibilities. The complete data set correctly identifies the main Uralic subgroups in a Principal Components Analysis, whereas GB data alone is insufficiently granular to detect this family-internal structure. Similar analyses limited to various typological subdomains also give variable results. A model-based admixture analysis identifies four distinct areas of historical interaction: Saami, Finnic, the Volga area and Ob-Ugric.

Keywords: Uralic languages, typology, areal linguistics, syntax, morphology, phonology, quantitative linguistics

Article outline

1.Introduction
2.UraTyp & Uralic languages in Grambank
- 2.1Previous systematic documentation of Uralic typological diversity
- 2.2Creating the UraTyp database
  - 2.2.1Uralic languages in Grambank
  - 2.2.2Defining the UT features
  - 2.2.3Coding the Uralic languages
  - 2.2.4Combining GB and UT data into UraTyp
- 2.3Data availability
3.Statistical analyses of the UraTyp data
- 3.1Overview of the variation in UraTyp data
- 3.2Clustering UraTyp and its subsets with PCA
- 3.3What distinguishes Uralic subfamilies?
- 3.4Diachronic patterns of typological admixture
4.Discussion
5.Conclusions and future perspectives
Acknowledgments
References

Available under the Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 license.

For any use beyond this license, please contact the publisher at [email protected].

Published online: 13 June 2022

https://doi.org/10.1075/jul.00002.nor

References (65)

References

Abondolo, Daniel Mario (ed.). 1998. The Uralic languages (Routledge Language Family Descriptions). London, New York: Routledge.

Aikio, Ante (Luobbal Sámmol Sámmol Ánte). 2012. An essay on Saami ethnolinguistic prehistory. In Riho Grünthal & Petri Kallio (eds.), A linguistic map of prehistoric northern Europe (MSFOu 266), 63–118. Helsinki: Finno-Ugrian Society.

Aikio, Ante. 2018. Notes on the development of some consonant clusters in Hungarian. In Sampsa Holopainen & Janne Saarikivi (eds.), Περὶ o̓ρθότητος ἐτύμων – Uusiutuva uralilainen etymologia [On the correctness of etymologies – Renewed Uralic etymology]. (Studia Uralica Helsingiensia 11), 77–90. Helsinki: Finno-Ugrian Society.

Alexander, David H., John Novembre & Kenneth Lange. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Research 191. 1655–1664.

Bereczki, Gábor. 1977. Permi-cseremisz lexikális kölcsönzések [Permic–Mari lexical borrowings]. Nyelvtudományi Közlemények 791. 57–76.

. 1984. Die Beziehungen zwischen den finnougrischen und türkischen Sprachen im Wolga–Kama-Gebiet [Relations between the Finno-Ugric and Turkic languages in the Volga-Kama region]. Nyelvtudományi Közlemények 861. 307–314.

Ceolin, Andrea, Cristina Guardiano, Monica Alexandrina Irimia & Giuseppe Longobardi. 2020. Formal syntax and deep history. Frontiers in Psychology 111.

Csepregi, Márta & Katalin Gugán. to appear. The syntax of Khanty. Manuscript, Research Centre for Linguistics, Hungary ([URL]) (Accessed 21-12-2021.)

Csúcs, Sándor. 1990. Die tatarischen Lehnwörter des Wotjakischen [The Tatar loanwords of Votyak]. Budapest: Akadémiai Kiadó.

Dahl, Östen & Viveka Velupillai. 2013. The past tense. In Matthew S. Dryer & Martin Haspelmath (eds.), The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. ([URL]) (Accessed 04-04-2021.)

De Groot, Casper. 2017. Uralic essive and the expression of impermanent state. Amsterdam, Philadelphia: John Benjamins.

Dediu, Dan & Stephen C. Levinson. 2012. Abstract profiles of structural stability point to universal tendencies, family-specific factors, and ancient connections between languages. In Alex Mesoudi (ed.), PLoS ONE 7(9). e45198.

Dryer, Matthew S. & Martin Haspelmath (eds.). 2013. WALS Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. ([URL]) (Accessed 03-11-2018)

Forkel, Robert, Sebastian Bank, Christoph Rzymski & Hans-Jörg Bibiko. 2020. clld/clld: clld – a toolkit for cross-linguistic databases (v7.2.0). Zenodo.

Forkel, Robert & Johann-Mattis List. 2020. CLDFBench: Give your cross-linguistic data a lift. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck et al. (eds.), Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 6995–7002. Paris: European Language Resources Association (ELRA).

Forkel, Robert, Johann-Mattis List, Simon J. Greenhill, Christoph Rzymski, Sebastian Bank, Michael Cysouw, Harald Hammarström, Martin Haspelmath, Gereon A. Kaiping & Russell D. Gray. 2018. Cross-linguistic data formats, advancing data sharing and re-use in comparative linguistics. Scientific Data 5(1). 180205.

François, Olivier. 2016. Running structure-like population genetic analyses with R. R tutorials in population genetics, University of Grenoble-Alpes, 1–9.

Frichot, Eric & Olivier François. 2015. LEA: An R package for landscape and ecological association studies. Methods in Ecology and Evolution 6(8). 925–929.

Good, Jeff & Michael Cysouw. 2013. Languoid, doculect, and glossonym: Formalizing the notion “language”. Language Documentation & Conservation 71. ([URL]) (Accessed 31-08-2021.)

Greenhill, Simon J., Q. D. Atkinson, A. Meade & Russell D. Gray. 2010. The shape and tempo of language evolution. Proceedings of the Royal Society B: Biological Sciences 277(1693). 2443–2450.

Greenhill, Simon J., Paul Heggarty & Russell D. Gray. 2020. Bayesian phylolinguistics. In R. D. Janda, B. D. Joseph & B. S. Vance (eds.), The handbook of historical linguistics, vol. 21, 226–253. New Jersey: Wiley-Blackwell.

Grünthal, Riho. 2015. Livonian at the crossroads of language contacts. In Santeri Junttila (ed.), Baltic languages and white nights (Uralica Helsingiensia 7), 12–67. Helsinki: Suomalais-Ugrilainen Seura.

. 2019. Canonical and non-canonical patterns in the adpositional phrase in Western Uralic: Constraints on borrowing. SUSA/JSFOu 971. 9–34.

Grünthal, Riho, Volker Heyd, Sampsa Holopainen, Juha A. Janhunen, Olesya Khanina, Matti Miestamo, Johanna Nichols, Janne Saarikivi & Kaius Sinnemäki. 2022. Drastic demographic events triggered the Uralic spread. Diachronica 1–35. John Benjamins.

(Accessed 31-08-2021.)

Gulya, János. 1977. Megjegyzések az ugor őshaza és az ugor nyelvek szétválása kérdéséről [Comments on the issue of the separation of the Ugric homeland and the Ugric languages]. In Bartha, Antal et al. (eds.), Magyar őstörténeti tanulmányok, 115–121. Budapest: Akadémiai Kiadó.

Hajdú, Péter. 1952. Az ugor kor helyének és idejének kérdéséhez [On the question of the place and time of the Ugric age]. Nyelvtudományi Közlemények 541. 264–269.

Hammarström, Harald, Robert Forkel, Martin Haspelmath & Sebastian Bank. 2021. Glottolog 4.4. Leipzig: Max Planck Institute for Evolutionary Anthropology.

, available online at [URL] (Accessed 31-08-2021.)

Haspelmath, Martin. 2001. The European linguistic area: Standard Average European. In Martin Haspelmath, Ekkehard König, Wulf Oesterreicher & Wolfgang Raible (eds.), Language typology and language universals (Handbücher zur Sprach- und Kommunikationswissenschaft, 20.2), 1492–1510. Berlin: Mouton de Gruyter.

Hausenberg, Anu-Reet & Paul, Kokla. 1988. Unificirovannaja sistema opisanija dialektov v primenenii k komi i marijskim glagolʹnym formam [A unified system applied in dialect description of Komi and Mari verb forms]. Soviet Finno-Ugric Studies 241. 19–26.

Havas, Ferenc. 2010. The Uralic typology database project. Paper presented at the Eleventh International Congress of Finno-Ugric Studies, Piliscsaba, Hungary, 9–14 August 2010. ([URL]) (Accessed 28-11-2021.)

Havas, Ferenc, Márta Csepregi, Nikolett F. Gulyás & Szilvia Németh. 2015. Typological Database of the Ugric Languages. Budapest: ELTE Finnugor Tanszék. ([URL]) (Accessed 09-06-2021.)

Heikkilä, Mikko. 2011. Huomioita kantasaamen ajoittamisesta ja paikantamisesta sekä germaanisia etymologioita suomalais-saamelaisille sanoille [Remarks on the timing and location of the native Sámi and Germanic etymologies for Finnish-Sámi words]. Virittäjä 11. 68–82.

Helimski, Eugene. 2003. Areal groupings (Sprachbünde) within and across the borders of the Uralic language family: a survey. Nyelvtudományi Közlemények 1001. 156–167.

Honkola, Terhi, Outi Vesakoski, Kalle Korhonen, Jüri Lehtinen, Kaj Syrjänen & Niklas Wahlberg. 2013. Cultural and climatic changes shape the evolutionary history of the Uralic languages. Journal of Evolutionary Biology 261. 1244–1253.

Honkola, Terhi, Kalle Ruokolainen, Kaj Syrjänen, Unni-Päivä Leino, Ilpo Tammi, Niklas Wahlberg & Outi Vesakoski. 2018. Evolution within a language: environmental differences contribute to divergence of dialect groups. BMC Evolutionary Biology 18(1), [132].

Honkola, Terhi, Jenni Santaharju, Kaj Syrjänen & Karl Pajusalu. 2019. Clustering lexical variation of Finnic languages, based on Atlas Linguarum Fennicarum. Linguistica Uralica 55(3). 161–184.

Honti, László. 1979. Features of Ugric languages (Observations on the question of Ugric unity). Acta Linguistica Academia Scientiarum Hungaricae 291. 1–25.

. 1997. Az ugor alapnyelv kérdéséhez [On the question of the Ugric protolanguage]. (Budapesti Finnugor Füzetek 7). Budapest: ELTE BTK Finnugor Tanszék.

Isanbaev, Nikolaj Isanbaevič. 1994. Marijsko-tjurkskie jazykovye kontakty. Častʹ vtoraja. [Mari-Turkic language contacts. Part Two.] Joškar-Ola: Marijskij naučno-issledovatelʹskij institut jazyka, literatury i istorii im. V. M. Vasilʹeva.

Johanson, Lars. 2000. Linguistic convergence in the Volga area. In Dicky Gilberts, John A. Nerbonne & Jos Schaecken (eds.), Languages in contact (Studies in Slavic and General Linguistics 28), 165–178. Leiden: Brill.

Klumpp, Gerson, Lidia Federica Mazzitelli & Fedor Rozhanskiy. 2018. Typology of Uralic languages: Current views and new perspectives. Introduction to the special issue of ESUKA – JEFUL. Eesti ja soome-ugri keeleteaduse ajakiri. Journal of Estonian and Finno-Ugric Linguistics 9 (1). 9–30.

Koptjevskaja-Tamm, Maria & Bernhard Wälchli. 2001. The Circum-Baltic languages: An areal-typological approach. In Östen Dahl & Maria Koptjevskaja-Tamm (eds.), The Circum-Baltic languages: Typology and contact. Volume 1: Grammar and typology (Studies in Language Companion Series 55), 615–750. Amsterdam, Philadelphia: John Benjamins.

Kowalik, Richard. (forthcoming). A grammar of spoken South Saami. Stockholm University doctoral dissertation.

Laakso, Johanna. 2020. Contact and the Finno-Ugric languages. In Raymond Hickey (ed.), The handbook of language contact, 2nd edition, 519–535. Wiley-Blackwell.

Lehtinen, Jyri, Terhi Honkola, Kalle Korhonen, Kaj Syrjänen, Niklas Wahlberg & Outi Vesakoski. 2014. Behind family trees. Language Dynamics and Change 4(2). 189–221.

Magga, Ole Henrik. 2014. Lullisámegiela muohtasánit [South Saami snow terminology]. Sámi dieđalaš áigečála 11. 27–49.

Miestamo, Matti. 2018. On the relationship between typology and the description of Uralic languages. Journal of Estonian and Finno-Ugric Linguistics 9(1). 31–53.

Miestamo, Matti, Anne Tamm & Beáta Wagner-Nagy (eds.). 2015. Negation in Uralic languages (Typological Studies in Language 108). Amsterdam: Benjamins.

Nichols, Johanna. 2021. The origin and dispersal of Uralic: Distributional typological view. Annual Review of Linguistics 7(1). 351–369.

Norvik, Miina, Yingqi Jing, Michael Dunn, Robert Forkel, Terhi Honkola, Gerson Klumpp, Richard Kowalik, Helle Metslang, Karl Pajusalu, Minerva Piha, Eva Saar, Sirkka Saarinen & Outi Vesakoski. 2021. Uralic Typological database – UraTyp. Zenodo.

Pajusalu, Karl, Kristel Uiboaed, Péter Pomozi, Endre Németh & Tibor Fehér. 2018. Towards a phonological typology of Uralic languages. Eesti ja soome-ugri keeleteaduse ajakiri. Journal of Estonian and Finno-Ugric Linguistics 9(1). 187–207.

Piha, Minerva. 2018. Combining Proto-Scandinavian loanword strata in South Saami with the Early Iron Age archaeological material of Jämtland and Dalarna, Sweden. Finnisch-Ugrische Forschungen 641. 118–233.

Piha, Minerva & Jaakko Häkkinen. 2020. Eteläsaamesta kantaeteläsaameen. Lainatodisteita eteläsaamen varhaisesta eriytymisestä [From Proto-Saami to Southern Proto-Saami. Loan evidence of the early drift of South Saami]. Sananjalka 621. 102–124.

Pritchard, Jonathan K., Matthew Stephens & Peter Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 1551. 945–59.

Rantanen, Timo, Outi Vesakoski, Jussi Ylikoski & Harri Tolvanen. 2021. Geographical database of the Uralic languages. Zenodo.

Reesink, Ger, Ruth Singer & Michael Dunn. 2009. Explaining the linguistic diversity of Sahul using population models. PLoS biology 7(11). e1000241.

Róna-Tas, András. 1988. Turkic influence on the Uralic languages. In Denis Sinor (ed.), The Uralic languages. Description, history and foreign influences, 742–780. Leiden, New York, København, Köln: E. J. Brill.

Saarinen, Sirkka. 1997. Language contacts in the Volga region: Loan suffixes and calques in Mari and Udmurt. In Heinrich Ramisch & Kenneth Wynne (eds.), Language in time and space. Studies in honour of Wolfgang Viereck on the occasion of his 60th birthday, 388–396. Stuttgart: Franz Steiner Verlag.

Skirgård, Hedvig, H. J. Haynie, Harald Hammarström, D. E. Blasi et al. Grambank data reveal global patterns in the structural diversity of the world’s languages. Submitted manuscript.

Syrjänen, Kaj. 2021. Quantitative language evolution: Case studies in Finnish dialects and Uralic languages. Tampere University doctoral dissertation. ([URL]) (Accessed 12-08-2021.)

Syrjänen, Kaj, Terhi Honkola, Kalle Korhonen, Jyri Lehtinen, Outi Vesakoski & Niklas Wahlberg. 2013. Shedding more light on language classification using basic vocabularies and phylogenetic methods: A case study of Uralic. Diachronica 30(3). 323–352.

Syrjänen, Kaj, Terhi Honkola, Jyri Lehtinen, Antti Leino & Outi Vesakoski. 2016. Applying population genetic approaches within languages: Finnish dialects as linguistic populations. Language Dynamics and Change 6 (2), 235–283.

Veenker, Wolfgang (ed.). 1985. Dialectologia Uralica. Materialen der ersten internationalen Symposium zur Dialektologie der uralischen Sprachen 4.–7. September 1984 in Hamburg [Dielectologia Uralica. Materials of the first international symposium on the dialectology of Uralic languages, 4–7 September 1984, Hamburg] (Veröffentlichungen der Societas Uralo-Altaica 20). Wiesbaden: Harrassowitz.

Wilkinson, M. D., M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3 (1). 160018.

Ylikoski, Jussi. 2016. The origins of the western Uralic s-cases revisited: Historiographical, functional-typological and Samoyedic perspectives. Finnisch-Ugrische Forschungen 631. 6–78.

Cited by (2)

Cited by two other publications

Hübler, Nataliia & Simon J Greenhill

2022. Modelling admixture across language levels to evaluate deep history claims. Journal of Language Evolution 7:2 ► pp. 166 ff.

Rantanen, Timo, Harri Tolvanen, Meeli Roose, Jussi Ylikoski, Outi Vesakoski & Søren Wichmann

2022. Best practices for spatial language data harmonization, sharing and map creation—A case study of Uralic. PLOS ONE 17:6 ► pp. e0269648 ff.

This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.