EXMARaLDA – creating, analysing and sharing spoken language corpora for pragmatic research

Thomas Schmidt and Kai Wörner

Abstract

This paper presents EXMARaLDA, a system for the computer-assisted creation and analysis of spoken language corpora. The first part contains some general observations about technological and methodological requirements for doing corpus-based pragmatics. The second part explains the system’s architecture and gives an overview of its most important software components – a transcription editor, a corpus management tool and a corpus query tool. The last part presents some corpora which have been or are currently being compiled with the help of EXMARaLDA.

Keywords:

Quick links

Abstract
Keywords
References

Baumgarten, Nicole, Annette Herkenrath, Thomas Schmidt, Kai Wörner, and Ludger Zeevaert

(2007) Studying connectivity with the help of computer-readable corpora: Some exemplary analyses from modern and historical, written and spoken corpora. In Jochen Rehbein, Christiane Hohenstein, and Lukas Pietsch (eds.), Connectivity in Grammar and Discourse. Amsterdam: Benjamins Publishing Company, pp. 259-289.

Braunmüller, Kurt

(2000) Semikommunikation in phatischen Dialogen. In Bernd Meyer, and Notis Toufexis (eds.), Text/Diskurs, Oralität/Literalität unter dem Aspekt der mehrsprachigen Kommunikation. Beiträge zum Workshop‚ Methodologie und Datenanalyse’. Working Papers in Multilingualism, Series B (11). Hamburg, pp. 101-114.

Bird, Steven, and Mark Liberman

(2001) A formal framework for linguistic annotation. Speech Communication 33.1,2: 23-60.

Bird, Steven, and Gary Simons

(2003) Seven dimensions of portability for language documentation and description. Language 79: 557-582.

Deppermann, Arnulf

(2000) Ethnographische Gesprächsanalyse: Zu Nutzen und Notwendigkeit von Ethnographie für die Konversationsanalyse. Gesprächsforschung 1: 96-124.

Edwards, Jane

(1993) Principles and contrasting systems of discourse transcription. In Jane Edwards, and Martin Lampert (eds.), Talking Data – Transcription and Coding in Discourse Research. Hillsdale: Erlbaum, pp. 3-31.

Ehlich, Konrad, and Jochen Rehbein

(1976) Halbinterpretative Arbeitstranskriptionen (HIAT). Linguistische Berichte 45: 21-41.

Ehlich, Konrad

(1993) HIAT - a transcription system for discourse data. In Jane Edwards, and Martin Lampert (eds.), Talking Data – Transcription and Coding in Discourse Research. Hillsdale: Erlbaum, pp. 123-148.

Isard, Amy, David McKelvie, and Henry Thompson

(1998) Towards a minimal standard for dialogue transcripts: A New SGML Architecture for the HCRC Map Task Corpus. Proceedings of the 5th International Conference on Spoken Language Processing. Sydney.

Meyer, Bernd

(2000) Zur Analyse gedolmetschter Arzt-Patienten-Kommunikation im Krankenhaus. In Bernd Meyer, and Notis Toufexis (eds.), Text/Diskurs, Oralität/Literalität unter dem Aspekt der mehrsprachigen Kommunikation. Beiträge zum Workshop Methodologie und Datenanalyse’. Working Papers in Multilingualism, Series B (11). Hamburg, pp. 45-53.

Ochs, Elinor

(1979) Transcription as theory. In Elinor Ochs, and Bambi Schieffelin (eds.), Developmental Pragmatics. New York, San Francisco, London: Academic Press, pp. 43-72.

Rehbein, Jochen, Wilhelm Grießhaber, Petra Löning, Marion Hartung, and Kristin Bührig

(1993) Manual für das computergestützte Transkribieren mit dem Programm syncWRITER nach dem Verfahren der Halbinterpretativen Arbeitstranskriptionen (HIAT). Hamburg: Germanisches Seminar, Universität Hamburg.

Rehbein, Jochen, Thomas Schmidt, Bernd Meyer, Franziska Watzke, and Annette Herkenrath

(2004) Handbuch für das computergestützte Transkribieren nach HIAT. Working Papers in Multilingualism, Series B (56). Hamburg.

Schmidt, Thomas

(2005) Time-based data models and the Text Encoding Initiative's guidelines for transcription of speech. Working Papers in Multilingualism, Series B (62). Hamburg.

Schmidt, Thomas, and Jasmine Bennöhr

(2008) Rescuing legacy data. Language Documentation & Conservation 2.1: 109-129.

Teubert, Wolfgang

(2005) My version of corpus linguistics. International Journal of Corpus Linguistics 1/2005.