Chapter published in:
Parallel Corpora for Contrastive and Translation Studies: New resources and applicationsEdited by Irene Doval and M. Teresa Sánchez Nieto
[Studies in Corpus Linguistics 90] 2019
► pp. 123–139
Building EPTIC
A many-sided, multi-purpose corpus of EU parliament proceedings
Adriano Ferraresi | University of Bologna
Silvia Bernardini | University of Bologna
This chapter describes the steps involved in the construction of EPTIC, an intermodal corpus of European Parliament speeches. Despite its limited size, this corpus has features that justify its labour-intensive building process, in particular its multiple alignments. The text-to-text alignments allow users to compare interpretations and translations of source speeches and their written-up reports, while text-to-video alignments allow them to access the multimedia components from concordance lines. To illustrate the potential of EPTIC, a case study is presented of English loan words in original, translated and interpreted Italian and French. Results suggest that borrowing is more likely to occur in translated Italian than in any of the other corpus components.
Keywords: intermodal corpora, text-to-text alignment, text-to-video alignment, corpus annotation, loan words
Article outline
- 1.Introduction: Why another corpus of European Parliament speeches?
- 2.What EPTIC looks like
- 2.1One corpus, fourteen subcorpora
- 2.2Practical details: Size and availability
- 3.Building EPTIC
- 3.1Selecting and obtaining raw corpus materials
- 3.2Transcribing the oral data
- 3.3Adding metadata
- 3.4Performing text-to-text alignment
- 3.5Performing text-to-video alignment
- 3.6POS-tagging, lemmatization and indexing
- 4.An example: English loan words in Italian and French
- 5.Conclusion: Teaming up
-
Acknowledgement -
Notes -
References
Published online: 20 March 2019
https://doi.org/10.1075/scl.90.08fer
https://doi.org/10.1075/scl.90.08fer
References
Baker, Mona
Bernardini, Silvia, Collard, Camille, Ferraresi, Adriano, Russo Mariachiara & Defrancq, Bart
Bogaards, Paul
Burnard, Lou
2004 Metadata for corpus work. In Developing Linguistic Corpora: A Guide to Good Practice, Martin Wynne (ed.). http://ota.ox.ac.uk/documents/creating/dlc/ (30 June 2017).
Chesterman, Andrew
Codrea-Rado, Anna
2014 European parliament has 24 official languages, but MEPs prefer English. The Guardian. https://www.theguardian.com/education/datablog/2014/may/21/european-parliament-english-language-official-debates-data (30 October 2017).
Evert, Stefan & the CWB Development Team
2016 The IMS Open Corpus Workbench (CWB) Corpus Encoding Tutorial. CWB Version 3.4: http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial/ (30 October 2017).
Frankenberg-Garcia, Ana & Santos, Diana
Granger, Sylviane
Johansson, Stig
Koehn, Philipp
Motschenbacher, Heiko
Niemants, Natacha
Nisioi, Sergiu, Rabinovich, Ella, Dinu, Liviu P. & Wintner, Shuly
Pietrandrea, Paola, Kahane, Sylvain, Lacheret-Dujour, Anne & Sabio, Frédéric
Rychlý, Pavel
Shlesinger, Miriam
2009 Towards a definition of interpretese: An intermodal, corpus-based study. In Efforts and Models in Interpreting and Translation Research: A Tribute to Daniel Gile [Benjamins Translation Library 80], Gyde Hansen, Andrew Chesterman & Heidrun Gerzymisch-Arbogast (eds), 237–253. Amsterdam: John Benjamins. 

Toury, Gideon
Varga, Dániel, Németh, László, Halácsy, Péter, Kornai, András, Viktor Trón & Nagy, Viktor
Vondřička, Pavel
Cited by
Cited by 3 other publications
Bendazzoli, Claudio, Michela Bertozzi & Mariachiara Russo
Ferraresi, Adriano, Silvia Bernardini, Maja Miličević Petrović & Marie-Aude Lefer
Kajzer-Wietrzny, Marta
This list is based on CrossRef data as of 01 april 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.