From the extraction of continuous features in parallel texts to visual analytics of heterogeneous areal-typological datasets
The aim of this paper is twofold. First, we show that functionally motivated procedural approaches may help to automatically extract typological features from texts. This idea is illustrated with measuring cross-linguistic variation in the domain of morphological typology based on parallel texts. Second, we demonstrate that the methodology developed in the field of visual analytics allows for detecting patterns or regularities in the automatically extracted features. At the heart of our approach lies an extended sunburst visualization, which enables a cross-comparison of a large number of features within the context of language genealogy and areal information. We provide evidence of the usefulness of the present approach with case studies where the visualizations of the extracted features reveal interesting insights.
References
Comrie, Bernard
1989
Language Universals and Linguistic Typology
, 2nd edn. Oxford: Blackwell.
Croft, William
2003
Typology and Universals
, 2nd edn. Cambridge: CUP.
Culy, Christopher
2012 Some challenges and directions for the visualization of language and linguistic data. Paper presented at the AVML 2012 conference in York.
Cysouw, Michael & Wälchli, Bernhard
2007 Parallel texts: Using translational equivalents in linguistic typology.
Sprachtypologie und Universalienforschung STUF
60(2): 95–99.
Donaldson, Bruce C.
1993
A Grammar of Afrikaans
. Berlin: Mouton de Gruyter.
Dryer, Matthew S.
1992 The Greenbergian word order correlation.
Language
68(1): 80–138.
Dryer, Matthew S.
2005 Prefixing vs. suffixing in inflectional morphology. In
The World Atlas of Language Structures
,
Martin Haspelmath,
Matthews S. Dryer,
David Gil &
Bernard Comrie (eds), Ch. 26. Oxford: OUP.
Dryer, Matthew S. & Haspelmath, Martin
eds).
The World Atlas of Language Structures Online
. Munich: Max Planck Digital Library.
[URL]>
Foley, William A.
1986
The Papuan Languages of New Guinea
. Cambridge: CUP.
Greenberg, Joseph H.
1960 A quantitative approach to the morphological typology of languages.
International Journal of American Linguistics
26: 178–194. First published in Spencer, Robert 1954
Festschrift for Wilson D. Wallis. Method and Perspective in Anthropology
. University of Minnesota Press.
Greenberg, Joseph H.
1963 Some universals of grammar with particular reference to the order of meaningful elements. In
Universals of Language
,
Joseph H. Greenberg (ed.), 110–113. Cambridge MA: The MIT Press.
Güldemann, Tom
2010 'Sprachraum' and geography: Linguistic macro-areas in Africa. In
The Handbook of Language Mapping
,
Alfred Lameli,
Roland Kehrein &
Stefan Rabanus. Berlin: Mouton de Gruyter.
Hammarström, Harald
2010 The status of the least documented language families in the world.
Language Documentation & Conservation
4: 177–212.
Hardwick, Roma & Healey, Joan
1967
Manga Buang Language Lessons
, First Draft. Brisbane: SIL Publications.
Healey, Joan
1984
Some Aspects of Topic Continuity in Mangga Buang Discourse
. Ukarumpa, Papua New Guinea: SIL Publications.
Hoel, Hanna Marie, Ikaheimonen, Tarja & Nozawa, Michiyo
1994
Mende Grammar Essentials
.
[URL]>
Hopper, Paul J.
1998 Emergent grammar. In
The New Psychology of Language. Cognitive and Functional Approaches to Language Structure
, Michael Tomasello (ed.), 155–175. Mahwah NJ: Lawrence Erlbaum Associates.
Iggesen, Oliver
2005 Number of cases. In
The World Atlas of Language Structures
,
Martin Haspelmath,
Matthews S. Dryer,
David Gil &
Bernard Comrie (eds), Ch. 49. Oxford: OUP.
Juola, Patrick
2008 Assessing linguistic complexity. In
Language Complexity, Typology, Contact, Change
[Studies in Language Companion Series 94],
Matti Miestamo,
Kaius Sinnemäki &
Fred Karlsson (eds.), 89–108. Amsterdam: John Benjamins.
Keim, Daniel A., Mansmann, Florian, Schneidewind, Jörn, Thomas, Jim & Ziegler, Hartmut
2008 Visual analytics: Scope and challenges. In
Visual Data Mining: Theory, Techniques and Tools for Visual Analytics
,
Simeon Simoff,
Michael H. Böhlen &
Arturas Mazeika (eds), 76–91. Berlin: Springer.
Lewis, M. Paul
2009
Ethnologue: Languages of the World
, 16th edn. Dallas TX: SIL International.
Lock, Arnold Hugo
2011
Abau Grammar. Data Papers on Papua New Guinea Languages
. Ukarumpa, Papua New Guinea: SIL-PNG Academic Publications.
Mackinlay, Jock
1986 Automating the design of graphical presentations of relational information.
ACM Trans. Graph
. 5(2): 110–141.
Mayer, Thomas, Rohrdantz, Christian, Butt, Miriam, Plank, Frans & Keim, Daniel A.
2010 Visualizing vowel harmony.
Journal of Linguistic Issues in Language Technology (LiLT)
4(2):1–33.
McElhanon, Kenneth A.
1975 North-eastern Trans-New Guinea Phylum languages. In
New Guinea Area Languages and Language Study, Vol. 1: Papuan Languages and the New Guinea Linguistic Scene
[PL, C-38],
Stephen A. Wurm (ed.), 527–567.
MacWhinney, Brian
(ed)
1999
The Emergence of Language
. Mahwah NJ: Lawrence Erlbaum Associates.
Miestamo, Matti
2005
Standard Negation: The Negation of Declarative Verbal Main Clauses in a Typological Perspective
[Empirical Approaches to Language Typology 31]. Berlin: Mouton de Gruyter.
Multitree: A Digital Library of Language Relationships
Ypsilanti MI: Institute for Language Information and Technology (LINGUIST List), Eastern Michigan University.
[URL]>
Nichols, Johanna
1992
Linguistic Diversity in Space and Time
. Chicago IL: The University of Chicago Press.
Popescu, Ioan-Iovitz, Mačutek, Ján & Altmann, Gabriel
2009
Aspects of Word Frequencies
. Lüdenscheidt: RAM.
Rijkhoff, Jan & Bakker, Dik
1998 Language sampling.
Linguistic Typology
2(3): 263–314.
Rohrdantz, Christian, Hund, Michael, Mayer, Thomas, Wälchli, Bernhard & Keim, Daniel A.
2012 The World’s Languages Explorer: Visual analysis of language features in genealogical and areal contexts.
Computer Graphics Forum
31(3): 935–944.
Stasko, John & Zhang, Eugene
2000 Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations. In
Proceedings of the IEEE Symposium on Information Visualization
, 57–65. Los Alamitos CA: IEEE Computer Society.
Stassen, Leon
1985
Comparison and Universal Grammar
. Oxford: Blackwell.
Tufte, Edward R.
1983
The Visual Display of Quantitative Information
. Cheshire CT: Graphics Press.
Wälchli, Bernhard
2009 Data reduction typology and the bimodal distribution bias.
Linguistic Typology
13(1): 77–94.
Wälchli, Bernhard
2012 Indirect measurement in morphological typology. In
Methods in Contemporary Linguistics
[Trends in Linguistics. Studies and Monographs TiLSM 247],
Andrea Ender,
Adrian Leemann &
Bernhard Wälchli (eds), 69–92. Berlin: de Gruyter.
Wälchli, Bernhard
Forthcoming.
Algorithmic typology and going from known to similar unknown categories within and across languages. In
Aggregating Dialectology, Typology, and Register Analysis: Linguistic Variation in Text and Speech, Within and Across Languages
,
Benedikt Szmrecsanyi &
Bernhard Wälchli (eds). Berlin: de Gruyter.
Ward, Matthew, Grinstein, Georges & Keim, Daniel A.
2010
Interactive Data Visualization: Foundations, Techniques, and Applications
. Natick MA: A.K. Peters.
Wichmann, Søren, Müller, André, Velupillai, Viveka, Wett, Annkathrin, Brown, Cecil H., Molochieva, Zarina, Bishoffberger, Julia, Holman, Eric W., Sauppe, Sebastian, Brown, Pamela, Bakker, Dik, List, Johann-Mattis, Egorov, Dmitry, Belyaev, Oleg, Urban, Matthias, Hammarström, Harald, Carrizo, Agustina, Mailhammer, Robert, Geyer, Helen, Beck, David, Korovina, Evgenia, Epps, Pattie, Valenzuela, Pilar & Grant, Anthony
2012 The ASJP Database (version 15).
Wurm, Stephen A.
1982
Papuan Languages of Oceania
. Tübingen: Gunter Narr.
Cited by
Cited by 2 other publications
Gutierrez-Vasques, Ximena, Christian Bentz & Tanja Samardžić
2023.
Languages Through the Looking Glass of BPE Compression.
Computational Linguistics 49:4
► pp. 943 ff.
Gutierrez-Vasques, Ximena & Victor Mijangos
2019.
Productivity and Predictability for Measuring Morphological Complexity.
Entropy 22:1
► pp. 48 ff.
This list is based on CrossRef data as of 20 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.