Term extraction may be defined as a text mining activity whose main purpose is to obtain all the terms included in a text of a given domain. Since the eighties, and mainly due to the rapid scientific advances as well as the evolution of the communication systems, there has been a growing interest in obtaining the terms found in written documents. A number of techniques and strategies have been proposed for satisfying this requirement. At present it seems that term extraction has reached a maturity stage. Nevertheless, many of the systems proposed fail to qualitatively present their results, almost every system evaluates its abilities in an ad hoc manner (if any, many times). Often, the authors do not explain their evaluation methodology; therefore comparisons between different implementations are difficult to draw. In this paper, we review the state-of-the-art of term extraction systems evaluation in the framework of natural language systems evaluation. The main approaches are presented, with a focus on their limitations. As an instantiation of some ideas for overcoming these limitations, the evaluation framework is applied to YATE, a hybrid term extractor.
Paletta, Francisco Carlos & José-Antonio Moreiro-González
2021. La transformación digital en los métodos y temas de la investigación brasileña de Información y Documentación 2010-2019. Revista Española de Documentación Científica 44:2 ► pp. e293 ff.
Zhao, Ziyan, Li Zhang & Xiaoli Lian
2021. 2021 IEEE 29th International Requirements Engineering Conference (RE), ► pp. 24 ff.
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
2020. In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora. Language Resources and Evaluation 54:2 ► pp. 385 ff.
da Silva Conrado, Merley, Ariani Di Felippo, Thiago Alexandre Salgueiro Pardo & Solange Oliveira Rezende
2014. A survey of automatic term extraction for Brazilian Portuguese. Journal of the Brazilian Computer Society 20:1
Marín, María José
2014. Evaluation of five single-word term recognition methods on a legal English corpus. Corpora 9:1 ► pp. 83 ff.
Vivaldi, Jorge & Horacio Rodriguez
2014. 2014 Third IEEE International Colloquium in Information Science and Technology (CIST), ► pp. 248 ff.
Conrado, Merley S., Thiago A. S. Pardo & Solange O. Rezende
2013. Exploration of a Rich Feature Set for Automatic Term Extraction. In Advances in Artificial Intelligence and Its Applications [Lecture Notes in Computer Science, 8265], ► pp. 342 ff.
van der Plas, Lonneke, Jörg Tiedemann & Ismail Fahmi
2011. Automatic Extraction of Medical Term Variants from Multilingual Parallel Translations. In Interactive Multi-modal Question-Answering, ► pp. 149 ff.
[no author supplied]
2016. Pilot-Controller Communication Problems and an Initial Exploration of Language-Engineering Technologies as a Potential Solution. In Human Factors in Transportation [Industrial and Systems Engineering Series, ], ► pp. 297 ff.
This list is based on CrossRef data as of 30 december 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.