Multilingual Corpora and Multilingual Corpus Analysis
Editors
This volume deals with different aspects of the creation and use of multilingual corpora. The term 'multilingual corpus' is understood in a comprehensive sense, meaning any systematic collection of empirical language data enabling linguists to carry out analyses of multilingual individuals, multilingual societies or multilingual communication. The individual contributions are thus concerned with a variety of spoken and written corpora ranging from learner and attrition corpora, language contact corpora and interpreting corpora to comparable and parallel corpora. The overarching aim of the volume is first to take stock of the variety of existing multilingual corpora, documenting possible corpus designs and uses, second to discuss methodological and technological challenges in the creation and analysis of multilingual corpora, and third to provide examples of linguistic analyses that were carried out on the basis of multilingual corpora.
[Hamburg Studies on Multilingualism, 14] 2012. xiii, 407 pp.
Publishing status: Available
© John Benjamins Publishing Company
Table of Contents
-
IntroductionThomas Schmidt and Kai Wörner | pp. xi–xiii
-
Section 1. Learner and attrition corpora
-
The LeaP corpus: A multilingual corpus of spoken learner German and learner EnglishUlrike Gut | pp. 3–23
-
Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken GermanHanna Hedeland and Thomas Schmidt | pp. 25–46
-
Creation and analysis of a reading comprehension exercise corpus: Towards evaluating meaning in contextNiels Ott, Ramon Ziai and Detmar Meurers | pp. 47–69
-
The ALeSKo learner corpus: Design – annotation – quantitative analysesHeike Zinsmeister and Margit Breckle | pp. 71–96
-
Corpora of spoken Spanish by simultaneous and successive German-Spanish bilingual and Spanish monolingual childrenMarta Saceda Ulloa, Conxita Lleó and Izarbe Garcia Sanchez | pp. 97–106
-
Monolingual and bilingual phonoprosodic corpora of child German and child SpanishConxita Lleó | pp. 107–122
-
Pragmatic corpus analysis, exemplified by Turkish-German bilingual and monolingual dataAnnette Herkenrath and Jochen Rehbein | pp. 123–152
-
Corpus of Polish spoken in Germany: Collecting and analysing written & spoken data for investigating contact-induced changeAgnieszka Czachór | pp. 153–161
-
The HABLA-corpus (German-French and German-Italian)Tanja Kupisch, Dagmar Barton, Giulia Bianchi and Ilse Stangen | pp. 163–179
-
Section 2. Language contact corpora
-
The Hamburg Corpus of Argentinean Spanish (HaCASpa)Christoph Gabriel | pp. 183–197
-
Ad hoc contact phenomena or established features of a contact variety? Evidence from corpus analysisKaroline Kühl | pp. 199–214
-
Phonoprosodic corpus of spoken Catalan (PhonCAT)Ariadna Benet, Susana Cortés and Conxita Lleó | pp. 215–229
-
Researching the intelligibility of a (German) dialectMagdalena Putz | pp. 231–243
-
Annotating ambiguity: Insights from a corpus-based study on syntactic change in Old SwedishSteffen Höder | pp. 245–271
-
Section 3. Interpreting corpora
-
Sharing community interpreting corpora: A pilot studyPhilipp Angermeyer, Bernd Meyer and Thomas Schmidt | pp. 275–294
-
CoSi – A Corpus of Consecutive and Simultaneous InterpretingJuliane House, Bernd Meyer and Thomas Schmidt | pp. 295–304
-
The corpus “Interpreting in Hospitals”: Possible applications for research and communication trainingKristin Bührig, Ortrun Kliche, Bernd Meyer and Birte Pawlack | pp. 305–315
-
Section 4. Comparable and parallel corpora
-
The GeWiss corpus: Comparing spoken academic German, English and PolishChristian Fandrych, Cordula Meißner and Adriana Slavcheva | pp. 319–337
-
Korpus C4: A distributed corpus of German varietiesHenrik Dittmann, Matej Ďurčo, Alexander Geyken, Tobias Roth and Kai Zimmer | pp. 339–346
-
Treebanks in translation studies: The CroCo Dependency TreebankOliver Čulo and Silvia Hansen-Schirra | pp. 347–361
-
Section 5. Corpus tools
-
Multilingual phonological corpus analysis: The tools behind the PhonBank ProjectYvan Rose | pp. 365–381
-
Finding the balance between strict defaults and total openness: Collecting and managing metadata for spoken language corpora with the EXMARaLDA Corpus ManagerKai Wörner | pp. 383–400
-
General index | pp. 401–404
-
Corpora index | pp. 405–406
-
Language index | p. 407
“This collection of multilingual corpora studies, above all, appeals to a wide readership interested in multilingualism and corpus linguistics. In addition, anyone who is to some extent interested in languages or linguistic studies may find the book useful, as it covers a wide range of areas related to linguistics such as contact situation, interpretation and translation studies and language learning process in terms of various language levels and sub-levels (e.g. spoken and written modes, pronunciation, written essays, etc.). The volume differs from related collections, which focus only one aspect of bilingual corpora on certain languages (e.g. Johansson, 2007, which focuses on the English-Norwegian Parallel Corpus and the Oslo Multilingual Corpus), or just one level and sub-level of language (e.g. Teubert, 2007, which deals with bilingual and multilingual lexicography and, annotation issues). Thus, this volume fills a gap in the literature of multilingualism and corpus linguistics. Another important aspect of the volume is that it includes studies on both small and large corpora and studies that deal with both the creation and analysis of multilingual corpora. The editors’ objectives of (i) introducing the audience to a large number of available multilingual corpora, (ii) raising issues frequently encountered in the methodological and technological aspects of corpus creation, and (iii) presenting a selection of linguistics analyses drawn from multilingual corpora clearly appear to have been achieved.”
Ali Karakas, University of Southampton, on Linguist List 24.2603, 2013
Cited by (13)
Cited by 13 other publications
Gromann, Dagmar, Elena-Simona Apostol, Christian Chiarcos, Marco Cremaschi, Jorge Gracia, Katerina Gkirtzou, Chaya Liebeskind, Liudmila Mockiene, Michael Rosner, Ineke Schuurman, Gilles Sérasset, Purificação Silvano, Blerina Spahiu, Ciprian-Octavian Truică, Andrius Utka, Giedre Valunaite Oleskeviciene & Harald Sack
Casalicchio, Jan & Manuela Caterina Moroni
Șan, Nebiye Hilal
Ben-Amos, Dan
Herry-Bénit, Nadine, Stéphanie Lopez, Takeki Kamiyama & Jeff Tennant
2021. The interphonology of contemporary English corpus (IPCE-IPAC). International Journal of Learner Corpus Research 7:2 ► pp. 275 ff.
Vessey, Rachelle
Kupisch, Tanja & Jason Rothman
Hantgan-Sonko, Abbie
Trouvain, Jürgen, Frank Zimmerer, Bernd Möbius, Mária Gósy & Anne Bonneau
2017. Segmental, prosodic and fluency features in phonetic learner corpora. International Journal of Learner Corpus Research 3:2 ► pp. 105 ff.
Labrador, Belen
Lázaro Gutiérrez, Raquel & María del Mar Sánchez Ramos
Domke, Christine & Christina Gansel
[no author supplied]
This list is based on CrossRef data as of 17 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
Subjects
Main BIC Subject
CFDM: Bilingualism & multilingualism
Main BISAC Subject
LAN009000: LANGUAGE ARTS & DISCIPLINES / Linguistics / General