EXMARaLDA – creating, analysing and sharing spoken language corpora for pragmatic research

Thomas Schmidt and Kai Wörner

Abstract

This paper presents EXMARaLDA, a system for the computer-assisted creation and analysis of spoken language corpora. The first part contains some general observations about technological and methodological requirements for doing corpus-based pragmatics. The second part explains the system’s architecture and gives an overview of its most important software components – a transcription editor, a corpus management tool and a corpus query tool. The last part presents some corpora which have been or are currently being compiled with the help of EXMARaLDA.

Keywords:
Quick links
A browser-friendly version of this article is not yet available. View PDF
Baumgarten, Nicole, Annette Herkenrath, Thomas Schmidt, Kai Wörner, and Ludger Zeevaert
(2007) Studying connectivity with the help of computer-readable corpora: Some exemplary analyses from modern and historical, written and spoken corpora. In Jochen Rehbein, Christiane Hohenstein, and Lukas Pietsch (eds.), Connectivity in Grammar and Discourse. Amsterdam: Benjamins Publishing Company, pp. 259-289. CrossrefGoogle Scholar
Braunmüller, Kurt
(2000) Semikommunikation in phatischen Dialogen. In Bernd Meyer, and Notis Toufexis (eds.), Text/Diskurs, Oralität/Literalität unter dem Aspekt der mehrsprachigen Kommunikation. Beiträge zum Workshop‚ Methodologie und Datenanalyse’. Working Papers in Multilingualism, Series B (11). Hamburg, pp. 101-114.Google Scholar
Bird, Steven, and Mark Liberman
(2001) A formal framework for linguistic annotation. Speech Communication 33.1,2: 23-60. CrossrefGoogle Scholar
Bird, Steven, and Gary Simons
(2003) Seven dimensions of portability for language documentation and description. Language 79: 557-582. CrossrefGoogle Scholar
Deppermann, Arnulf
(2000) Ethnographische Gesprächsanalyse: Zu Nutzen und Notwendigkeit von Ethnographie für die Konversationsanalyse. Gesprächsforschung 1: 96-124.Google Scholar
Edwards, Jane
(1993) Principles and contrasting systems of discourse transcription. In Jane Edwards, and Martin Lampert (eds.), Talking Data – Transcription and Coding in Discourse Research. Hillsdale: Erlbaum, pp. 3-31.Google Scholar
Ehlich, Konrad, and Jochen Rehbein
(1976) Halbinterpretative Arbeitstranskriptionen (HIAT). Linguistische Berichte 45: 21-41.Google Scholar
Ehlich, Konrad
(1993) HIAT - a transcription system for discourse data. In Jane Edwards, and Martin Lampert (eds.), Talking Data – Transcription and Coding in Discourse Research. Hillsdale: Erlbaum, pp. 123-148.Google Scholar
Isard, Amy, David McKelvie, and Henry Thompson
(1998) Towards a minimal standard for dialogue transcripts: A New SGML Architecture for the HCRC Map Task Corpus. Proceedings of the 5th International Conference on Spoken Language Processing. Sydney.
Meyer, Bernd
(2000) Zur Analyse gedolmetschter Arzt-Patienten-Kommunikation im Krankenhaus. In Bernd Meyer, and Notis Toufexis (eds.), Text/Diskurs, Oralität/Literalität unter dem Aspekt der mehrsprachigen Kommunikation. Beiträge zum Workshop Methodologie und Datenanalyse’. Working Papers in Multilingualism, Series B (11). Hamburg, pp. 45-53.Google Scholar
Ochs, Elinor
(1979) Transcription as theory. In Elinor Ochs, and Bambi Schieffelin (eds.), Developmental Pragmatics. New York, San Francisco, London: Academic Press, pp. 43-72.Google Scholar
Rehbein, Jochen, Wilhelm Grießhaber, Petra Löning, Marion Hartung, and Kristin Bührig
(1993) Manual für das computergestützte Transkribieren mit dem Programm syncWRITER nach dem Verfahren der Halbinterpretativen Arbeitstranskriptionen (HIAT). Hamburg: Germanisches Seminar, Universität Hamburg.Google Scholar
Rehbein, Jochen, Thomas Schmidt, Bernd Meyer, Franziska Watzke, and Annette Herkenrath
(2004) Handbuch für das computergestützte Transkribieren nach HIAT. Working Papers in Multilingualism, Series B (56). Hamburg.Google Scholar
Schmidt, Thomas
(2005) Time-based data models and the Text Encoding Initiative's guidelines for transcription of speech. Working Papers in Multilingualism, Series B (62). Hamburg.Google Scholar
Schmidt, Thomas, and Jasmine Bennöhr
(2008) Rescuing legacy data. Language Documentation & Conservation 2.1: 109-129.Google Scholar
Teubert, Wolfgang
(2005) My version of corpus linguistics. International Journal of Corpus Linguistics 1/2005.Google Scholar