Europarl is a large multilingual corpus containing the minutes of the debates at the European Parliament. This article presents a method to extract different corpora from Europarl: monolingual and multilingual comparable corpora, as well as parallel corpora. Using state-of-the-art measures of homogeneity, we show that these corpora are very similar. In addition, we argue that they present many advantages for research in various fields of linguistics and translation studies, and we also discuss some of their limitations. We conclude by reviewing a number of previous studies that made use of these corpora, emphasizing in each case the possibilities offered by Europarl.
Rabinovich, Ella, Shuly Wintner & Ofek Luis Lewinsohn
2018. A Parallel Corpus of Translationese. In Computational Linguistics and Intelligent Text Processing [Lecture Notes in Computer Science, 9624], ► pp. 140 ff.
2017. A quantitative approach to conceptual, procedural and pragmatic meaning: Evidence from inter-annotator agreement. Journal of Pragmatics 117 ► pp. 245 ff.
This list is based on CrossRef data as of 4 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.