Zufferey, Sandrine, Bruno Cartoni and Thomas Meyer. 2013. Using the Europarl corpus for cross-linguistic research. Belgian Journal of Linguistics 27 (1) : 23–42.
Article in journal
John Benjamins
Europarl is a large multilingual corpus containing the minutes of the debates at the European Parliament. This article presents a method to extract different corpora from Europarl: monolingual and multilingual comparable corpora, as well as parallel corpora. Using state-of-the-art measures of homogeneity, this paper shows that these corpora are very similar. In addition, it argues that they present many advantages for research in various fields of linguistics and translation studies, and it also discusses some of their limitations. It concludes by reviewing a number of previous studies that made use of these corpora, emphasizing in each case the possibilities offered by Europarl.