Learning Lessons from Bilingual Corpora: Benefits for Machine Translation

Streiter, Oliver; Iomdin, Leonid L.

doi:10.1075/ijcl.5.2.06str

Article published In:

International Journal of Corpus Linguistics
Vol. 5:2 (2000) ► pp.199–230

Learning Lessons from Bilingual Corpora: Benefits for Machine Translation

Oliver Streiter | Academia Sinica, Institute of Information Science

Leonid L. Iomdin | Institute for Information Transmission Problems, Russian Academy of Sciences

The research described in this paper is rooted in the endeavors to combine the advantages of corpus-based and rule-based MT approaches in order to improve the performance of MT systems—most importantly, the quality of translation. The authors review the ongoing activities in the field and present a case study, which shows how translation knowledge can be drawn from parallel corpora and compiled into the lexicon of a rule-based MT system. These data are obtained with the help of three procedures: (1) identification of hence unknown one-word translations, (2) statistical rating of the known one-word translations, and (3) extraction of new translations of multiword expressions (MWEs) followed by compilation steps which create new rules for the MT engine. As a result, the lexicon is enriched with translation equivalents attested for different subject domains, which facilitates the tuning of the MT system to a specific subject domain and improves the quality and adequacy of translation.

Published online: 30 May 2001

https://doi.org/10.1075/ijcl.5.2.06str

Cited by (1)

Cited by one other publication

Laukaitis, Algirdas & Olegas Vasilecas

2007. Asymmetric Hybrid Machine Translation for Languages with Scarce Resources. In Computational Linguistics and Intelligent Text Processing [Lecture Notes in Computer Science, 4394], ► pp. 397 ff.

This list is based on CrossRef data as of 5 august 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.