What matters more: The size of the corpora or their quality?
The case of automatic translation of multiword expressions using
comparable corpora
This study investigates (and compares) the impact of the size and
the similarity/quality of comparable corpora on the specific task of
extracting translation equivalents of verb-noun collocations from such
corpora. The comprehensive evaluation of different configurations of English
and Spanish corpora sheds some light on the more general and perennial
question: what matters more – the quantity or quality of corpora?
Article outline
- 1.Rationale
- 2.Our methodology for translating multiword expressions
- 3.Data and experiments
- 3.1Comparable corpora
- 3.2Data
- 3.3Vector representations
- 3.4Gold standard
- 4.Comparable corpora and translation of mwes: Size vs. quality
- 5.Conclusion
-
Notes
-
References