From the extraction of continuous features in parallel texts to visual analytics of heterogeneous areal-typological datasets
The aim of this paper is twofold. First, we show that functionally motivated procedural approaches may help to automatically extract typological features from texts. This idea is illustrated with measuring cross-linguistic variation in the domain of morphological typology based on parallel texts. Second, we demonstrate that the methodology developed in the field of visual analytics allows for detecting patterns or regularities in the automatically extracted features. At the heart of our approach lies an extended sunburst visualization, which enables a cross-comparison of a large number of features within the context of language genealogy and areal information. We provide evidence of the usefulness of the present approach with case studies where the visualizations of the extracted features reveal interesting insights.
References (43)
Comrie, Bernard
1989
Language Universals and Linguistic Typology
, 2nd edn. Oxford: Blackwell.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Croft, William
2003
Typology and Universals
, 2nd edn. Cambridge: CUP.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Culy, Christopher
2012 Some challenges and directions for the visualization of language and linguistic data. Paper presented at the AVML 2012 conference in York.
Cysouw, Michael & Wälchli, Bernhard
2007 Parallel texts: Using translational equivalents in linguistic typology.
Sprachtypologie und Universalienforschung STUF
60(2): 95–99.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Donaldson, Bruce C.
1993
A Grammar of Afrikaans
. Berlin: Mouton de Gruyter.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Dryer, Matthew S.
1992 The Greenbergian word order correlation.
Language
68(1): 80–138.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Dryer, Matthew S.
2005 Prefixing vs. suffixing in inflectional morphology. In
The World Atlas of Language Structures
,
Martin Haspelmath,
Matthews S. Dryer,
David Gil &
Bernard Comrie (eds), Ch. 26. Oxford: OUP.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Dryer, Matthew S. & Haspelmath, Martin
eds).
The World Atlas of Language Structures Online
. Munich: Max Planck Digital Library.
[URL]>
Foley, William A.
1986
The Papuan Languages of New Guinea
. Cambridge: CUP.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Greenberg, Joseph H.
1960 A quantitative approach to the morphological typology of languages.
International Journal of American Linguistics
26: 178–194. First published in Spencer, Robert 1954
Festschrift for Wilson D. Wallis. Method and Perspective in Anthropology
. University of Minnesota Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Greenberg, Joseph H.
1963 Some universals of grammar with particular reference to the order of meaningful elements. In
Universals of Language
,
Joseph H. Greenberg (ed.), 110–113. Cambridge MA: The MIT Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Güldemann, Tom
2010 'Sprachraum' and geography: Linguistic macro-areas in Africa. In
The Handbook of Language Mapping
,
Alfred Lameli,
Roland Kehrein &
Stefan Rabanus. Berlin: Mouton de Gruyter.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hammarström, Harald
2010 The status of the least documented language families in the world.
Language Documentation & Conservation
4: 177–212.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hardwick, Roma & Healey, Joan
1967
Manga Buang Language Lessons
, First Draft. Brisbane: SIL Publications.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Healey, Joan
1984
Some Aspects of Topic Continuity in Mangga Buang Discourse
. Ukarumpa, Papua New Guinea: SIL Publications.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hoel, Hanna Marie, Ikaheimonen, Tarja & Nozawa, Michiyo
1994
Mende Grammar Essentials
.
[URL]>
Hopper, Paul J.
1998 Emergent grammar. In
The New Psychology of Language. Cognitive and Functional Approaches to Language Structure
, Michael Tomasello (ed.), 155–175. Mahwah NJ: Lawrence Erlbaum Associates.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Iggesen, Oliver
2005 Number of cases. In
The World Atlas of Language Structures
,
Martin Haspelmath,
Matthews S. Dryer,
David Gil &
Bernard Comrie (eds), Ch. 49. Oxford: OUP.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Juola, Patrick
2008 Assessing linguistic complexity. In
Language Complexity, Typology, Contact, Change
[Studies in Language Companion Series 94],
Matti Miestamo,
Kaius Sinnemäki &
Fred Karlsson (eds.), 89–108. Amsterdam: John Benjamins.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Keim, Daniel A., Mansmann, Florian, Schneidewind, Jörn, Thomas, Jim & Ziegler, Hartmut
2008 Visual analytics: Scope and challenges. In
Visual Data Mining: Theory, Techniques and Tools for Visual Analytics
,
Simeon Simoff,
Michael H. Böhlen &
Arturas Mazeika (eds), 76–91. Berlin: Springer.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lewis, M. Paul
2009
Ethnologue: Languages of the World
, 16th edn. Dallas TX: SIL International.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lock, Arnold Hugo
2011
Abau Grammar. Data Papers on Papua New Guinea Languages
. Ukarumpa, Papua New Guinea: SIL-PNG Academic Publications.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mackinlay, Jock
1986 Automating the design of graphical presentations of relational information.
ACM Trans. Graph
. 5(2): 110–141.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mayer, Thomas, Rohrdantz, Christian, Butt, Miriam, Plank, Frans & Keim, Daniel A.
2010 Visualizing vowel harmony.
Journal of Linguistic Issues in Language Technology (LiLT)
4(2):1–33.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
McElhanon, Kenneth A.
1975 North-eastern Trans-New Guinea Phylum languages. In
New Guinea Area Languages and Language Study, Vol. 1: Papuan Languages and the New Guinea Linguistic Scene
[PL, C-38],
Stephen A. Wurm (ed.), 527–567.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
MacWhinney, Brian
(ed)
1999
The Emergence of Language
. Mahwah NJ: Lawrence Erlbaum Associates.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Miestamo, Matti
2005
Standard Negation: The Negation of Declarative Verbal Main Clauses in a Typological Perspective
[Empirical Approaches to Language Typology 31]. Berlin: Mouton de Gruyter.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Multitree: A Digital Library of Language Relationships
Ypsilanti MI: Institute for Language Information and Technology (LINGUIST List), Eastern Michigan University.
[URL]>
Nichols, Johanna
1992
Linguistic Diversity in Space and Time
. Chicago IL: The University of Chicago Press.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Popescu, Ioan-Iovitz, Mačutek, Ján & Altmann, Gabriel
2009
Aspects of Word Frequencies
. Lüdenscheidt: RAM.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rijkhoff, Jan & Bakker, Dik
1998 Language sampling.
Linguistic Typology
2(3): 263–314.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rohrdantz, Christian, Hund, Michael, Mayer, Thomas, Wälchli, Bernhard & Keim, Daniel A.
2012 The World’s Languages Explorer: Visual analysis of language features in genealogical and areal contexts.
Computer Graphics Forum
31(3): 935–944.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Stasko, John & Zhang, Eugene
2000 Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations. In
Proceedings of the IEEE Symposium on Information Visualization
, 57–65. Los Alamitos CA: IEEE Computer Society.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Stassen, Leon
1985
Comparison and Universal Grammar
. Oxford: Blackwell.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tufte, Edward R.
1983
The Visual Display of Quantitative Information
. Cheshire CT: Graphics Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Wälchli, Bernhard
2009 Data reduction typology and the bimodal distribution bias.
Linguistic Typology
13(1): 77–94.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Wälchli, Bernhard
2012 Indirect measurement in morphological typology. In
Methods in Contemporary Linguistics
[Trends in Linguistics. Studies and Monographs TiLSM 247],
Andrea Ender,
Adrian Leemann &
Bernhard Wälchli (eds), 69–92. Berlin: de Gruyter.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Wälchli, Bernhard
Forthcoming.
Algorithmic typology and going from known to similar unknown categories within and across languages. In
Aggregating Dialectology, Typology, and Register Analysis: Linguistic Variation in Text and Speech, Within and Across Languages
,
Benedikt Szmrecsanyi &
Bernhard Wälchli (eds). Berlin: de Gruyter.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
Ward, Matthew, Grinstein, Georges & Keim, Daniel A.
2010
Interactive Data Visualization: Foundations, Techniques, and Applications
. Natick MA: A.K. Peters.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Wichmann, Søren, Müller, André, Velupillai, Viveka, Wett, Annkathrin, Brown, Cecil H., Molochieva, Zarina, Bishoffberger, Julia, Holman, Eric W., Sauppe, Sebastian, Brown, Pamela, Bakker, Dik, List, Johann-Mattis, Egorov, Dmitry, Belyaev, Oleg, Urban, Matthias, Hammarström, Harald, Carrizo, Agustina, Mailhammer, Robert, Geyer, Helen, Beck, David, Korovina, Evgenia, Epps, Pattie, Valenzuela, Pilar & Grant, Anthony
2012 The ASJP Database (version 15).
Wurm, Stephen A.
1982
Papuan Languages of Oceania
. Tübingen: Gunter Narr.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cited by (2)
Cited by 2 other publications
Gutierrez-Vasques, Ximena, Christian Bentz & Tanja Samardžić
2023.
Languages Through the Looking Glass of BPE Compression.
Computational Linguistics 49:4
► pp. 943 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
Gutierrez-Vasques, Ximena & Victor Mijangos
2019.
Productivity and Predictability for Measuring Morphological Complexity.
Entropy 22:1
► pp. 48 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
This list is based on CrossRef data as of 23 june 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.