In order to understand how children cope with the enormous variation in structures worldwide, developmental paths need to be studied in a sufficiently varied sample of languages. Because each study requires very large and expensive longitudinal corpora (about one million words, five to seven years of development), the relevant sample must be chosen strategically. We propose to base the choice on the results of a clustering algorithm (fuzzy clustering) applied to typological databases. The algorithm establishes a sample that maximizes the typological differences between languages. As a case study, we apply the algorithm to a dozen typological variables known to have an impact on acquisition, concerning such issues as the presence and nature of agreement and case marking, word order, degrees of synthesis, polyexponence and inflectional compactness of categories, syncretism, the existence of inflectional classes etc. The results allow deriving small samples that are maximally diverse. As a side result, we also note that while the clustering algorithm allows maximization of diversity for sampling purposes, the resulting clusters themselves are far from being discrete and therefore do not reflect a natural partition into basic language types.
Aravena-Bravo, Paulina, Alejandrina Cristia, Rowena Garcia, Hiromasa Kotera, Ramona Kunene Nicolas, Ronel Laranjo, Bolanle Elizabeth Arokoyo, Silvia Benavides-Varela, Titia Benders, Natalie Boll-Avetisyan, Margaret Cychosz, Rodrigo Dal Ben, Yatma Diop, Catalina Durán-Urzúa, Naomi Havron, Marie Manalili, Bhuvana Narasimhan, Paul Okyere Omane, Caroline Rowland, Leticia Schiavon Kolberg, Andrew Sentoogo Ssemata, Suzy J. Styles, Belén Troncoso-Acosta & Fei Ting Woon
2023. Towards Diversifying Early Language Development Research: The First Truly Global International Summer/Winter School on Language Acquisition (/L+/) 2021. Journal of Cognition and Development► pp. 1 ff.
Levshina, Natalia
2022. Corpus-based typology: applications, challenges and some solutions. Linguistic Typology 26:1 ► pp. 129 ff.
Ponnet, Aaricia & Ludovic De Cuypere
2024. The acquisition of Hindi split-ergativity and differential object marking by Dutch L1 speakers: systematicity and variation. Language Acquisition 31:2 ► pp. 145 ff.
Schnell, Stefan & Nils Norman Schiborr
2022. Crosslinguistic Corpus Studies in Linguistic Typology. Annual Review of Linguistics 8:1 ► pp. 171 ff.
Sinnemäki, Kaius
2014. Cognitive processing, language typology, and variation. WIREs Cognitive Science 5:4 ► pp. 477 ff.
Slobin, Dan
2022. Capturing what remains: A commentary on Kidd and Garcia (2022). First Language 42:6 ► pp. 818 ff.
Stevenson, Suzanne & Paola Merlo
2022. Beyond the Benchmarks: Toward Human-Like Lexical Representations. Frontiers in Artificial Intelligence 5
This list is based on CrossRef data as of 18 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.