Publications
Publication details [#59865]
Gaizauskas, Robert, Monica Lestari Paramita, Emma Barker, Marcis Pinnis, Ahmet Aker and Marta Pahisa Solé. 2015. Extracting bilingual terms from the Web. Terminology 21 (2) : 205–236.
Publication type
Article in journal
Publication language
English
Keywords
Language as a subject
Place, Publisher
John Benjamins
Journal DOI
10.1075/term
Annotation
This paper makes two contributions. First, it describes a multi-component system called BiTES (Bilingual Term Extraction System), designed to automatically gather domain-specific bilingual term pairs from Web data. BiTES components consist of data gathering tools, domain classifiers, monolingual text extraction systems and bilingual term aligners. BiTES is readily extendable to new language pairs and has been successfully used to gather bilingual terminology for 24 language pairs, including English and all official EU languages, save Irish. Second, the paper describes a novel set of methods for evaluating the main components of BiTES and presents the results of this evaluation for six language pairs. Results show that the BiTES approach can be used to successfully harvest quality bilingual term pairs from the Web. The evaluation method delivers significant insights about the strengths and weaknesses of the used techniques. It can be straightforwardly reused to evaluate other bilingual term extraction systems and makes a novel contribution to the study of how to evaluate bilingual terminology extraction systems.