The paper describes LUIZ, a bilingual term recognition system that has been developed for the Slovene-English language pair. The system is a hybrid term extractor using morphosyntactic patterns and statistical ranking to propose domain-specific expressions for each of the two languages, whereupon translation equivalents between the languages are identified using the innovative bag-of-equivalents approach. This simple but effective method is based on the Twente word aligner to obtain a lexicon of single word translation pairs and their probability scores, which is then used to identify correspondences between multi-word terms. The bilingual term recognition system has been tested and evaluated on three parallel subcorpora from the tourism, accounting and military domain. Average precision of the term alignment component is 0.83, whereby only fully equivalent and domain-relevant terms were counted as positives. Another advantage of the described approach is the fact that we successfully detect term variants and multiple translations of a candidate multi-word term. Since our term alignment method does not require sentence-aligned corpora it can be used with comparable corpora, provided we already have a domain-specific lexicon or dictionary of single-word correspondences. The paper concludes with some thoughts on the users of term recognition systems and their needs based on our observations from the online version of the system.
Croijmans, Ilja, Iris Hendrickx, Els Lefever, Asifa Majid & Antal Van Den Bosch
2020. Uncovering the language of wine experts. Natural Language Engineering 26:5 ► pp. 511 ff.
Harastani, Rima, Béatrice Daille & Emmanuel Morin
2012. Neoclassical Compound Alignments from Comparable Corpora. In Computational Linguistics and Intelligent Text Processing [Lecture Notes in Computer Science, 7182], ► pp. 72 ff.
Hellrich, Johannes & Udo Hahn
2014. Enhancing Multilingual Biomedical Terminologies via Machine Translation from Parallel Corpora. In Natural Language Processing and Information Systems [Lecture Notes in Computer Science, 8455], ► pp. 9 ff.
Hoste, Veronique, Klaar Vanopstal, Ayla Rigouts Terryn & Els Lefever
2019. The Trade-off between Quantity and Quality. Comparing a Large Crawled Corpus and a Small Focused Corpus for Medical Terminology Extraction. Across Languages and Cultures 20:2 ► pp. 197 ff.
Hörberg, Thomas, Maria Larsson & Jonas K. Olofsson
2022. The Semantic Organization of the English Odor Vocabulary. Cognitive Science 46:11
Logar Berginc, Nataša & Dejan Verčič
2013. Terminological databanks as the bodies of knowledge: Slovenian public relations terminology. Public Relations Review 39:5 ► pp. 569 ff.
Pinnis, Mārcis, Nikola Ljubešić, Dan Ştefănescu, Inguna Skadiņa, Marko Tadić, Tatjana Gornostaja, Špela Vintar & Darja Fišer
2019. Extracting Data from Comparable Corpora. In Using Comparable Corpora for Under-Resourced Areas of Machine Translation [Theory and Applications of Natural Language Processing, ], ► pp. 89 ff.
2020. Terminology in Media Discourse: A Case Study of Terms Denoting Phobia Types in English, Lithuanian and Norwegian News Media Sites. Research in Language 18:4 ► pp. 359 ff.
Repar, Andraž, Matej Martinc & Senja Pollak
2020. Reproduction, replication, analysis and adaptation of a term alignment approach. Language Resources and Evaluation 54:3 ► pp. 767 ff.
2022. TermEnsembler. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication► pp. 93 ff.
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
2020. In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora. Language Resources and Evaluation 54:2 ► pp. 385 ff.
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
2021. HAMLET. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 27:2 ► pp. 254 ff.
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
2022. Tagging terms in text. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28:1 ► pp. 157 ff.
Tran, Hanh Thi Hong, Matej Martinc, Antoine Doucet & Senja Pollak
2022. Can Cross-Domain Term Extraction Benefit from Cross-lingual Transfer?. In Discovery Science [Lecture Notes in Computer Science, 13601], ► pp. 363 ff.
Tran, Hanh Thi Hong, Matej Martinc, Andraz Pelicon, Antoine Doucet & Senja Pollak
2022. Ensembling Transformers for Cross-domain Automatic Term Extraction. In From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries [Lecture Notes in Computer Science, 13636], ► pp. 90 ff.
Vivaldi, Jorge & Iria da Cunha
2012. QAINEX Track 2011: Question Expansion and Reformulation Using the REG Summarization System. In Focused Retrieval of Content and Structure [Lecture Notes in Computer Science, 7424], ► pp. 257 ff.
[no author supplied]
2014. Bibliography. In Comparable Corpora and Computer‐Assisted Translation, ► pp. 277 ff.
This list is based on CrossRef data as of 9 june 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.