Multilingual Corpora and Multilingual Corpus Analysis

Editors

Thomas Schmidt | University of Hamburg

Kai Wörner | University of Hamburg

Hardbound – Available

ISBN 9789027219343 | EUR 75.00 | USD 113.00

e-Book –

ISBN 9789027273444 | EUR 75.00 | USD 113.00

This volume deals with different aspects of the creation and use of multilingual corpora. The term 'multilingual corpus' is understood in a comprehensive sense, meaning any systematic collection of empirical language data enabling linguists to carry out analyses of multilingual individuals, multilingual societies or multilingual communication. The individual contributions are thus concerned with a variety of spoken and written corpora ranging from learner and attrition corpora, language contact corpora and interpreting corpora to comparable and parallel corpora. The overarching aim of the volume is first to take stock of the variety of existing multilingual corpora, documenting possible corpus designs and uses, second to discuss methodological and technological challenges in the creation and analysis of multilingual corpora, and third to provide examples of linguistic analyses that were carried out on the basis of multilingual corpora.

[Hamburg Studies on Multilingualism, 14] 2012. xiii, 407 pp.

Publishing status: Available

https://doi.org/10.1075/hsm.14

Table of Contents

Introduction

Thomas Schmidt and Kai Wörner | pp. xi–xiii

Section 1. Learner and attrition corpora

The LeaP corpus: A multilingual corpus of spoken learner German and learner English

Ulrike Gut | pp. 3–23

Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German

Hanna Hedeland and Thomas Schmidt | pp. 25–46

Creation and analysis of a reading comprehension exercise corpus: Towards evaluating meaning in context

Niels Ott, Ramon Ziai and Detmar Meurers | pp. 47–69

The ALeSKo learner corpus: Design – annotation – quantitative analyses

Heike Zinsmeister and Margit Breckle | pp. 71–96

Corpora of spoken Spanish by simultaneous and successive German-Spanish bilingual and Spanish monolingual children

Marta Saceda Ulloa, Conxita Lleó and Izarbe Garcia Sanchez | pp. 97–106

Monolingual and bilingual phonoprosodic corpora of child German and child Spanish

Conxita Lleó | pp. 107–122

Pragmatic corpus analysis, exemplified by Turkish-German bilingual and monolingual data

Annette Herkenrath and Jochen Rehbein | pp. 123–152

Corpus of Polish spoken in Germany: Collecting and analysing written & spoken data for investigating contact-induced change

Agnieszka Czachór | pp. 153–161

The HABLA-corpus (German-French and German-Italian)

Tanja Kupisch, Dagmar Barton, Giulia Bianchi and Ilse Stangen | pp. 163–179

Section 2. Language contact corpora

The Hamburg Corpus of Argentinean Spanish (HaCASpa)

Christoph Gabriel | pp. 183–197

Ad hoc contact phenomena or established features of a contact variety? Evidence from corpus analysis

Karoline Kühl | pp. 199–214

Phonoprosodic corpus of spoken Catalan (PhonCAT)

Ariadna Benet, Susana Cortés and Conxita Lleó | pp. 215–229

Researching the intelligibility of a (German) dialect

Magdalena Putz | pp. 231–243

Annotating ambiguity: Insights from a corpus-based study on syntactic change in Old Swedish

Steffen Höder | pp. 245–271

Section 3. Interpreting corpora

Sharing community interpreting corpora: A pilot study

Philipp Angermeyer, Bernd Meyer and Thomas Schmidt | pp. 275–294

CoSi – A Corpus of Consecutive and Simultaneous Interpreting

Juliane House, Bernd Meyer and Thomas Schmidt | pp. 295–304

The corpus “Interpreting in Hospitals”: Possible applications for research and communication training

Kristin Bührig, Ortrun Kliche, Bernd Meyer and Birte Pawlack | pp. 305–315

Section 4. Comparable and parallel corpora

The GeWiss corpus: Comparing spoken academic German, English and Polish

Christian Fandrych, Cordula Meißner and Adriana Slavcheva | pp. 319–337

Korpus C4: A distributed corpus of German varieties

Henrik Dittmann, Matej Ďurčo, Alexander Geyken, Tobias Roth and Kai Zimmer | pp. 339–346

Treebanks in translation studies: The CroCo Dependency Treebank

Oliver Čulo and Silvia Hansen-Schirra | pp. 347–361

Section 5. Corpus tools

Multilingual phonological corpus analysis: The tools behind the PhonBank Project

Yvan Rose | pp. 365–381

Finding the balance between strict defaults and total openness: Collecting and managing metadata for spoken language corpora with the EXMARaLDA Corpus Manager

Kai Wörner | pp. 383–400

General index | pp. 401–404

Corpora index | pp. 405–406

Language index | p. 407

“This collection of multilingual corpora studies, above all, appeals to a wide readership interested in multilingualism and corpus linguistics. In addition, anyone who is to some extent interested in languages or linguistic studies may find the book useful, as it covers a wide range of areas related to linguistics such as contact situation, interpretation and translation studies and language learning process in terms of various language levels and sub-levels (e.g. spoken and written modes, pronunciation, written essays, etc.). The volume differs from related collections, which focus only one aspect of bilingual corpora on certain languages (e.g. Johansson, 2007, which focuses on the English-Norwegian Parallel Corpus and the Oslo Multilingual Corpus), or just one level and sub-level of language (e.g. Teubert, 2007, which deals with bilingual and multilingual lexicography and, annotation issues). Thus, this volume fills a gap in the literature of multilingualism and corpus linguistics. Another important aspect of the volume is that it includes studies on both small and large corpora and studies that deal with both the creation and analysis of multilingual corpora. The editors’ objectives of (i) introducing the audience to a large number of available multilingual corpora, (ii) raising issues frequently encountered in the methodological and technological aspects of corpus creation, and (iii) presenting a selection of linguistics analyses drawn from multilingual corpora clearly appear to have been achieved.”

Ali Karakas, University of Southampton, on Linguist List 24.2603, 2013

Cited by

Cited by 12 other publications

Order by:

Casalicchio, Jan & Manuela Caterina Moroni

2023. The Syntax–Pragmatics Interface in Heritage Languages: The Use of anche (“Also”) in German Heritage Speakers of Italian. Languages 8:2 ► pp. 104 ff.

Domke, Christine & Christina Gansel

2014. Zur Einführung: Korpora in der Linguistik - Perspektiven und Positionen zu Daten und Datenerhebung. Mitteilungen des Deutschen Germanistenverbandes 61:1 ► pp. 1 ff.

Gromann, Dagmar, Elena-Simona Apostol, Christian Chiarcos, Marco Cremaschi, Jorge Gracia, Katerina Gkirtzou, Chaya Liebeskind, Liudmila Mockiene, Michael Rosner, Ineke Schuurman, Gilles Sérasset, Purificação Silvano, Blerina Spahiu, Ciprian-Octavian Truică, Andrius Utka, Giedre Valunaite Oleskeviciene & Harald Sack

2024. Multilinguality and LLOD: A survey across linguistic description levels. Semantic Web ► pp. 1 ff.

Hantgan-Sonko, Abbie

2017. Crossroads Corpus creation: Design and case study. Yearbook of the Poznan Linguistic Meeting 3:1 ► pp. 1 ff.

Herry-Bénit, Nadine, Stéphanie Lopez, Takeki Kamiyama & Jeff Tennant

2021. The interphonology of contemporary English corpus (IPCE-IPAC). International Journal of Learner Corpus Research 7:2 ► pp. 275 ff.

Kupisch, Tanja & Jason Rothman

2018. Terminology matters! Why difference is not incompleteness and how early child bilinguals are heritage speakers. International Journal of Bilingualism 22:5 ► pp. 564 ff.

Labrador, Belen

2016. Translation as an aid to ELT: Using an English–Spanish parallel corpus (P-ACTRES) to study Englishbothand its Spanish counterparts. Digital Scholarship in the Humanities 31:3 ► pp. 499 ff.

Lázaro Gutiérrez, Raquel & María del Mar Sánchez Ramos

2015. Corpus-Based Interpreting Studies and Public Service Interpreting and Translation Training Programs: The Case of Interpreters Working in Gender Violence Contexts. In Yearbook of Corpus Linguistics and Pragmatics 2015 [Yearbook of Corpus Linguistics and Pragmatics, 3], ► pp. 275 ff.

Trouvain, Jürgen, Frank Zimmerer, Bernd Möbius, Mária Gósy & Anne Bonneau

2017. Segmental, prosodic and fluency features in phonetic learner corpora. International Journal of Learner Corpus Research 3:2 ► pp. 105 ff.

Vessey, Rachelle

2019. Arja Nurmi, Tanja Rütten, and Päivi Pahta (eds): Challenging the Myth of Monolingual Corpora. Applied Linguistics 40:5 ► pp. 864 ff.

Șan, Nebiye Hilal

2023. Subordination in Turkish Heritage Children with and without Developmental Language Impairment. Languages 8:4 ► pp. 239 ff.

[no author supplied]

2020. World Englishes on the Web [Varieties of English Around the World, G63],

This list is based on CrossRef data as of 16 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.

Subjects

Linguistics

Corpus linguistics

Multilingualism

Applied linguistics

Main BIC Subject

CFDM: Bilingualism & multilingualism

Main BISAC Subject

LAN009000: LANGUAGE ARTS & DISCIPLINES / Linguistics / General

ONIX Metadata

ONIX 2.1

ONIX 3.0

U.S. Library of Congress Control Number: 2012021737 | Marc record