Using Parallel Corpora for Translation-Oriented Term Extraction

Vintar, Spela

doi:10.1075/babel.47.2.04vin

Article published In:

Babel
Vol. 47:2 (2001) ► pp.121–132

Using Parallel Corpora for Translation-Oriented Term Extraction

Spela Vintar | University of Ljubljana

In many scientific, technological or political fields terminology and the production of up-to-date reference works is lagging behind, which causes problems to translators and results in inconsistent translations. Parallel corpora of texts already translated can be used as a resource for automatic extraction of terms and terminological collocations. Especially for smaller languages where existing resources are scarce, collecting and exploiting parallel corpora may be the chief method of obtaining terminological data.

The paper describes how a methodology for multi-word term extraction and bilingual conceptual mapping was developed for Slovene-English terms. We used word-to-word alignment to extract a bilingual glossary of single-word terms, and for multi-word terms two methods were tested and compared. The statistical method is broadly applicable but gives results of very limited use, while the method of syntactic patterns extracts highly useful terminological phrases, however only from a tagged corpus. A vision of further development is given and how these methods might be incorporated into existing translation tools.

Published online: 11 March 2002

https://doi.org/10.1075/babel.47.2.04vin

Cited by

Cited by 1 other publications

Otero, Pablo Gamallo

2006. Using Natural Alignment to Extract Translation Equivalents. In Computational Processing of the Portuguese Language [Lecture Notes in Computer Science, 3960], ► pp. 41 ff.

This list is based on CrossRef data as of 25 march 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.