Chapter published in:Computational Phraseology
Edited by Gloria Corpas Pastor and Jean-Pierre Colson
[IVITRA Research in Linguistics and Literature 24] 2020
► pp. 178–187
What matters more: The size of the corpora or their quality?
The case of automatic translation of multiword expressions using comparable corpora
This study investigates (and compares) the impact of the size and the similarity/quality of comparable corpora on the specific task of extracting translation equivalents of verb-noun collocations from such corpora. The comprehensive evaluation of different configurations of English and Spanish corpora sheds some light on the more general and perennial question: what matters more – the quantity or quality of corpora?
Keywords: multiword expressions, automatic translation, comparable corpora, size of corpora, vector representations
Published online: 08 May 2020