Learning Lessons from Bilingual Corpora: Benefits for Machine Translation
Leonid L. Iomdin | Institute for Information Transmission Problems, Russian Academy of Sciences
The research described in this paper is rooted in the endeavors to combine the advantages of corpus-based and rule-based MT approaches in order to improve the performance of MT systems—most importantly, the quality of translation. The authors review the ongoing activities in the field and present a case study, which shows how translation knowledge can be drawn from parallel corpora and compiled into the lexicon of a rule-based MT system. These data are obtained with the help of three procedures: (1) identification of hence unknown one-word translations, (2) statistical rating of the known one-word translations, and (3) extraction of new translations of multiword expressions (MWEs) followed by compilation steps which create new rules for the MT engine. As a result, the lexicon is enriched with translation equivalents attested for different subject domains, which facilitates the tuning of the MT system to a specific subject domain and improves the quality and adequacy of translation.
Cited by (1)
Cited by one other publication
Laukaitis, Algirdas & Olegas Vasilecas
2007.
Asymmetric Hybrid Machine Translation for Languages with Scarce Resources. In
Computational Linguistics and Intelligent Text Processing [
Lecture Notes in Computer Science, 4394],
► pp. 397 ff.
This list is based on CrossRef data as of 5 august 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.