Multilingual Corpora and Multilingual Corpus Analysis

Edited by Thomas Schmidt and Kai Wörner
University of Hamburg
This volume deals with different aspects of the creation and use of multilingual corpora. The term 'multilingual corpus' is understood in a comprehensive sense, meaning any systematic collection of empirical language data enabling linguists to carry out analyses of multilingual individuals, multilingual societies or multilingual communication. The individual contributions are thus concerned with a variety of spoken and written corpora ranging from learner and attrition corpora, language contact corpora and interpreting corpora to comparable and parallel corpora. The overarching aim of the volume is first to take stock of the variety of existing multilingual corpora, documenting possible corpus designs and uses, second to discuss methodological and technological challenges in the creation and analysis of multilingual corpora, and third to provide examples of linguistic analyses that were carried out on the basis of multilingual corpora.
[Hamburg Studies on Multilingualism, 14]  2012.  xiii, 407 pp.
Publishing status: Available
HardboundAvailable
ISBN 9789027219343 | EUR 75.00 | USD 113.00
 
e-BookSold by e-book platforms
ISBN 9789027273444 | EUR 75.00 | USD 113.00
 
 

Table of Contents

Introduction
Thomas Schmidt and Kai Wörner
xi–xiii
Section 1. Learner and attrition corpora
The LeaP corpus: A multilingual corpus of spoken learner German and learner English
Ulrike Gut
3–23
Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German
Hanna Hedeland and Thomas Schmidt
25–46
Creation and analysis of a reading comprehension exercise corpus: Towards evaluating meaning in context
Niels Ott, Ramon Ziai and Detmar Meurers
47–69
The ALeSKo learner corpus: Design – annotation – quantitative analyses
Heike Zinsmeister and Margit Breckle
71–96
Corpora of spoken Spanish by simultaneous and successive German-Spanish bilingual and Spanish monolingual children
Marta Saceda Ulloa, Conxita Lleó and Izarbe Garcia Sanchez
97–106
Monolingual and bilingual phonoprosodic corpora of child German and child Spanish
Conxita Lleó
107–122
Pragmatic corpus analysis, exemplified by Turkish-German bilingual and monolingual data
Annette Herkenrath and Jochen Rehbein
123–152
Corpus of Polish spoken in Germany: Collecting and analysing written & spoken data for investigating contact-induced change
Agnieszka Czachór
153–161
The HABLA-corpus (German-French and German-Italian)
Tanja Kupisch, Dagmar Barton, Giulia Bianchi and Ilse Stangen
163–179
Section 2. Language contact corpora
The Hamburg Corpus of Argentinean Spanish (HaCASpa)
Christoph Gabriel
183–197
Ad hoc contact phenomena or established features of a contact variety?: Evidence from corpus analysis
Karoline H. Kühl
199–214
Phonoprosodic corpus of spoken Catalan (PhonCAT)
Ariadna Benet, Susana Cortés and Conxita Lleó
215–229
Researching the intelligibility of a (German) dialect
Magdalena Putz
231–243
Annotating ambiguity: Insights from a corpus-based study on syntactic change in Old Swedish
Steffen Höder
245–271
Section 3. Interpreting corpora
Sharing community interpreting corpora: A pilot study
Philipp Sebastian Angermeyer, Bernd Meyer and Thomas Schmidt
275–294
CoSi – A Corpus of Consecutive and Simultaneous Interpreting
Juliane House, Bernd Meyer and Thomas Schmidt
295–304
The corpus “Interpreting in Hospitals”: Possible applications for research and communication training
Kristin Bührig, Ortrun Kliche, Bernd Meyer and Birte Pawlack
305–315
Section 4. Comparable and parallel corpora
The GeWiss corpus: Comparing spoken academic German, English and Polish
Christian Fandrych, Cordula Meißner and Adriana Slavcheva
319–337
Korpus C4: A distributed corpus of German varieties
Henrik Dittmann, Matej Ďurčo, Alexander Geyken, Tobias Roth and Kai Zimmer
339–346
Treebanks in translation studies: The CroCo Dependency Treebank
Oliver Čulo and Silvia Hansen-Schirra
347–361
Section 5. Corpus tools
Multilingual phonological corpus analysis: The tools behind the PhonBank Project
Yvan Rose
365–381
Finding the balance between strict defaults and total openness: Collecting and managing metadata for spoken language corpora with the EXMARaLDA Corpus Manager
Kai Wörner
383–400
General index
401–404
Corpora index
405–406
Language index
407

Subjects

Benjamins Subject classification

BIC Subject

CFDM: Bilingualism & multilingualism

BISAC Subject

LAN009000: LANGUAGE ARTS & DISCIPLINES / Linguistics
U.S. Library of Congress Control Number:  2012021737
This page is part of John Benjamins Publishing Company website. Click 'embed' to view its contents in the fully-featured web application. Embed