TExSIS
Bilingual terminology extraction from parallel corpora using chunk-based alignment
We report on TExSIS, a flexible bilingual terminology extraction system that uses a sophisticated chunk-based alignment method for the generation of candidate terms, after which the specificity of the candidate terms is determined by combining several statistical filters. Although the set-up of the architecture is largely language-independent, we present terminology extraction results for four different languages and three language pairs. Gold standard data sets were created for French-Italian, French-English and French-Dutch, which allowed us not only to evaluate precision, which is common practice, but also recall. We compared the TExSIS approach, which takes a multilingual perspective from the start, with the more commonly used approach of first identifying term candidates monolingually and then aligning the source and target terms. A comparison of our system with the LUIZ approach described by Vintar (2010) reveals that TExSIS outperforms LUIZ both for monolingual and bilingual terminology extraction. Our results also clearly show that the precision of the alignment is crucial for the success of the terminology extraction. Furthermore, based on the observation that the precision scores for bilingual terminology extraction outperform those of the monolingual systems, we conclude that multilingual evidence helps to determine unithood in less related languages.
Keywords: automatic term extraction, bilingual term extraction, chunks, alignment, parallel corpora
Published online: 29 April 2013
https://doi.org/10.1075/term.19.1.01mac
https://doi.org/10.1075/term.19.1.01mac
Cited by
Cited by other publications
Croijmans, Ilja, Iris Hendrickx, Els Lefever, Asifa Majid & Antal Van Den Bosch
Dash, Niladri Sekhar & L. Ramamoorthy
Desmet, Bart & Véronique Hoste
Hanoulle, Sabien, Véronique Hoste & Aline Remael
Horák, Aleš, Vít Baisa, Adam Rambousek & Vít Suchomel
Hoste, Veronique, Klaar Vanopstal, Ayla Rigouts Terryn & Els Lefever
Kessler, Remy, Nicolas Bechet & Giuseppe Berio
Lefever, Els
Lefever, Els, Marjan Van de Kauter & Véronique Hoste
Li, Bin & Jianmin Yao
Macken, Lieve & Arda Tezcan
Martinez-Rodriguez, Jose L., Aidan Hogan, Ivan Lopez-Arevalo & Andreas Hotho
Mennes, Julie
Mennes, Julie, Ted Pedersen & Els Lefever
Ngo, The Quyen, My Linh Ha, Thi Minh Huyen Nguyen, Thi Mai Huong Hoang & Viet Hung Nguyen
Oliver, Antoni
Repar, Andraž, Matej Martinc & Senja Pollak
Repar, Andraž, Vid Podpečan, Anže Vavpetič, Nada Lavrač & Senja Pollak
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
Tien, Ha Nguyen, Quyen Ngo The, Huyen Nguyen Thi Minh & Linh Ha My
Xiong, Deyi, Fandong Meng & Qun Liu
Zhao, Chongchong, Chao Dong & Xiaoming Zhang
This list is based on CrossRef data as of 22 november 2020. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.