Cataloging and metadata, formats, and encodings: Sharing data in small and endangered languages

Thieberger, Nicholas; Jacobson, Michel

doi:10.1075/z.158.15thi

Part of

Language Documentation: Practice and values
Edited by Lenore A. Grenoble and N. Louanna Furbee
[Not in series 158] 2010
► pp. 147–158

Sharing data in small and endangered languages

Cataloging and metadata, formats, and encodings

Nicholas Thieberger

Michel Jacobson

Speakers of small or ‘under-resourced’ languages often first contact the world of Information Technology via the effort of field linguists. Good practices in linguistic data management include the separation of structure and content and of data and metadata formats. Primary outputs of field research (lexicon, transcripts and interlinear glossed text collections, and their associated media) need to be coded and preserved. Long-term access to these data is addressed by the establishment of archives that also act as the locus for training and advocacy for well-formed data. In this paper we discuss two such archives, one in Australia, the Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC), and the other in France, the “Archiving Project” from the LACITO/CNRS.

Published online: 25 November 2010

https://doi.org/10.1075/z.158.15thi