This paper describes TermEnsembler, a bilingual term extraction and alignment system utilizing a novel ensemble learning
approach to bilingual term alignment. In the proposed system, the processing starts with monolingual term extraction from a language
industry standard file type containing aligned English and Slovenian texts. The two separate term lists are then automatically aligned using
an ensemble of seven bilingual alignment methods, which are first executed separately and then merged using the weights learned with an
evolutionary algorithm. In the experiments, the weights were learned on one domain and tested on two other domains. When evaluated on the
top 400 aligned term pairs, the precision of term alignment is over 96%, while the number of correctly aligned multi-word unit terms exceeds
30% when evaluated on the top 400 term pairs.
Ahmad, Khurshid, Lee Gillam, and Lena Tostevin. 2000. “Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER).” In Proceedings of the 8th Text Retrieval Conference (TREC-8), 717–724. Washington, USA.
Aker, Ahmet, Monica Paramita, and Rob Gaizauskas. 2013. “Extracting Bilingual Terminologies from Comparable Corpora.” In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 402–411. Sofia, Bulgaria.
Amjadian, Ehsan, Diana Inkpen, Tahereh Paribakht, and Farahnaz Faez. 2016. “Local-Global Vectors to Improve Unigram Terminology Extraction.” In Proceedings of the 5th International Workshop on Computational Terminology, 2–11. Osaka, Japan.
Baisa, Vít, Barbora Ulipová, and Michal Cukr. 2015. “Bilingual Terminology Extraction in Sketch Engine.” In 9th Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2015 – Proceedings, 61–67. Karlova Studánka, Czech Republic.
Bird, Steven, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. Sebastopol: O’Reilly Media Inc.
Church, Kenneth Ward, and Patrick Hanks. 1990. “Word Association Norms, Mutual Information, and Lexicography.” Computational Linguistics 16 (1): 22–29.
Cohen, Jacob. 1968. “Weighted Kappa: Nominal Scale Agreement Provision for Scaled Disagreement or Partial Credit.” Psychological Bulletin 70 (4): 213.
Conneau, Alexis, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. “Word Translation Without Parallel Data.” ([URL]) Accessed 2 February 2019.
Daille, Béatrice, and Emmanuel Morin. 2005. “French-English Terminology Extraction from Comparable Corpora.” In Proceedings of the 2nd International Joint Conference on Natural Language Processing, 707–718. Jeju Island, South Korea.
Daille, Béatrice, Éric Gaussier, and Jean-Marc Langé. 1994. “Towards Automatic Extraction of Monolingual and Bilingual Terminology.” In Proceedings of the 15th Conference on Computational linguistics, 515–521. Kyoto, Japan.
Dice, LR.1945. “Measures of the Amount of Ecologic Association between Species.” Ecology 26 (3): 297–302.
Foo, Jody. 2012. Computational Terminology: Exploring Bilingual and Monolingual Term Extraction. Linköping: Linköping University Electronic Press.
Fortin, Félix-Antoine, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. “DEAP: Evolutionary Algorithms Made Easy.” Journal of Machine Learning Research 131 (no. Jul): 2171–2175.
Frantzi, Katerina, Sophia Ananiadou, and Hideki Mirna. 2000. “Automatic Recognition of Multi-Word Terms:. the C-Value/NC-Value Method.” International Journal on Digital Libraries 3(2): 115–130.
Haque, Rejwanul, Sergio Penkale, and Andy Way. 2014. “Bilingual Termbank Creation via Log-Likelihood Comparison and Phrase-Based Statistical Machine Translation.” In Proceedings of the 4th International Workshop on Computational Terminology (Computerm), 42–51. Dublin, Ireland.
Hazem, Amir, and Emmanuel Morin. 2017. “Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora.” In Proceedings of the 8th International Joint Conference on Natural Language Processing, 685–693. Taipei, Taiwan.
Hiemstra, Djoerd. 1998. “Multilingual Domain Modeling in Twenty-One: Automatic Creation of a Bi-Directional Translation Lexicon from a Parallel Corpus.” In Proceedings of the 8th CLIN Meeting, 41–58. Amsterdam, The Netherlands.
Justeson, John, and Slava Katz. 1995. “Technical Terminology: some Linguistic Properties and an Algorithm for Identification in Text.” Natural Language Engineering 1 (1): 9–27.
Khan, Muhammad Tahir, Yukun Ma, and Jung-jae Kim. 2016. “Term Ranker: A Graph-Based Re-Ranking Approach.” In Proceedings of the 29th International Florida Artificial Intelligence Research Society Conference, 310–315. Key Largo, USA.
Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowanet al.2007. “Moses: Open Source Toolkit for Statistical Machine Translation.” In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, 177–180. Prague, Czech Republic.
Kupiec, Julian. 1993. “An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora.” In Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, 17–22. Columbus, USA.
Landis, Richard, and Gary Koch. 1977. “The Measurement of Observer Agreement for Categorical Data.” Biometrics 33 (1): 159–174.
Ljubešić, Nikola, and Tomaž Erjavec. 2016. “Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene.” In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), 23–28. Portorož, Slovenia.
Logar, Nataša, Miha Grčar, Marko Brakus, Tomaž Erjavec, Špela Arhar Holdt, and Simon Krek. 2012. Korpusi slovenskega jezika Gigafida, KRES, ccGigafida in ccKRES: gradnja, vsebina, uporaba [Slovenian language corpora Gigafida, KRES, ccGigafida, ccKRES: creation, content, use]. Ljubljana: Trojina, zavod za uporabno slovenistiko; Fakulteta za družbene vede.
McEnery, Tony, Richard Xiao, and Yukio Tono. 2006. Corpus-Based Language Studies: An Advanced Resource Book. London: Taylor & Francis.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” ([URL]) Accessed 10 July 2018.
Neubig, Graham, Taro Watanabe, Eiichiro Sumita, Shinsuke Mori, and Tatsuya Kawahara. 2011. “An Unsupervised Model for Joint Phrase Alignment and Extraction.” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 632–641. Portland, USA.
Och, Franz Josef, and Hermann Ney. 2003. “A Systematic Comparison of Various Statistical Alignment Models.” Computational Linguistics 29 (1): 19–51.
Pollak, Senja, Anže Vavpetič, Janez Kranjc, Nada Lavrač, and Špela Vintar. 2012. “NLP Workflow for On-Line Definition Extraction from English and Slovene Text Corpora.” In Proceedings of KONVENS 2012, 53–60. Vienna, Austria.
Repar, Andraž, and Senja Pollak. 2017a. “Good Examples for Terminology Databases in Translation.” In Electronic Lexicography in the 21st century. Proceedings of eLex 2017 Conference, 651–661. Leiden, Netherlands.
Repar, Andraž, and Senja Pollak. 2017b. “Ontology-Based Translation Memory Maintenance.” In Proceedings of the 20th International Multiconference Information Society 2017, 19–22. Ljubljana, Slovenia.
Schmitz, Klaus Dirk, and Daniela Straub. 2016. “Tight Budgets and a Growing Number of Languages Impede Terminology Work.” tcworld magazine for international information management ([URL]). Accessed 24 August 2018.
The British National Corpus, version 3 (BNC XML Edition). 2007. Distributed by Bodleian Libraries, University of Oxford, on behalf of the BNC Consortium. (URL: [URL]). Accessed 10 March 2017.
Wang, Rui, Wei Liu, and Chris McDonald. 2016. “Featureless Domain-Specific Term Extraction with Minimal Labelled Data.” In Proceedings of the Australasian Language Technology Association Workshop, 103–112. Melbourne, Australia.
Wermter, Joachim, and Udo Hahn. 2005. “Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms.” In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, 843–850. Vancouver, Canada.
Wüster, Eugene. 1979. Introduction to the General Theory of Terminology and Terminological Lexicography. Vienna: Springer.
Zhang, Zigi, Jie Gao, and Fabio Ciravegna. 2018. “SemRe-Rank: Incorporating Semantic Relatedness to Improve Automatic Term Extraction Using Personalized PageRank.” ([URL]) Accessed 7 January 2019.
Tran, Hanh Thi Hong, Matej Martinc, Antoine Doucet & Senja Pollak
2022. Can Cross-Domain Term Extraction Benefit from Cross-lingual Transfer?. In Discovery Science [Lecture Notes in Computer Science, 13601], ► pp. 363 ff.
Tran, Hanh Thi Hong, Matej Martinc, Andraz Pelicon, Antoine Doucet & Senja Pollak
2022. Ensembling Transformers for Cross-domain Automatic Term Extraction. In From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries [Lecture Notes in Computer Science, 13636], ► pp. 90 ff.
Amjadian, Ehsan, Nicholas Prayogo, Serena McDonnell, Cathal Smyth & Muhammad Rizwan Abid
2021. 2021 IEEE Aerospace Conference (50100), ► pp. 1 ff.
This list is based on CrossRef data as of 27 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.