Evaluation of terms and term extraction systems: A practical approach

Vivaldi, Jorge; Rodríguez, Horacio

doi:10.1075/term.13.2.06viv

Article published In:

Terminology
Vol. 13:2 (2007) ► pp.225–248

Evaluation of terms and term extraction systems

A practical approach

Jorge Vivaldi

Horacio Rodríguez

Term extraction may be defined as a text mining activity whose main purpose is to obtain all the terms included in a text of a given domain. Since the eighties, and mainly due to the rapid scientific advances as well as the evolution of the communication systems, there has been a growing interest in obtaining the terms found in written documents. A number of techniques and strategies have been proposed for satisfying this requirement. At present it seems that term extraction has reached a maturity stage. Nevertheless, many of the systems proposed fail to qualitatively present their results, almost every system evaluates its abilities in an ad hoc manner (if any, many times). Often, the authors do not explain their evaluation methodology; therefore comparisons between different implementations are difficult to draw. In this paper, we review the state-of-the-art of term extraction systems evaluation in the framework of natural language systems evaluation. The main approaches are presented, with a focus on their limitations. As an instantiation of some ideas for overcoming these limitations, the evaluation framework is applied to YATE, a hybrid term extractor.

Keywords: term extraction, term extractor evaluation, evaluation

Published online: 19 November 2007

https://doi.org/10.1075/term.13.2.06viv

Cited by (19)

Cited by 19 other publications

Order by:

Malyuga, Elena N.

2024. The Functional-Pragmatic Dimension of Corporate Communication. In The Language of Corporate Communication, ► pp. 1 ff.

Gallego-Hernández, Daniel

2022. Extracción de fraseología especializada basada en corpus. Revista Española de Lingüística Aplicada/Spanish Journal of Applied Linguistics 35:1 ► pp. 294 ff.

Nugumanova, Aliya, Darkhan Akhmed-Zaki, Madina Mansurova, Yerzhan Baiburin & Almasbek Maulit

2022. NMF-based approach to automatic term extraction. Expert Systems with Applications 199 ► pp. 117179 ff.

Kwong, Oi Yee

2021. User-driven assessment of commercial term extractors. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 27:2 ► pp. 179 ff.

Paletta, Francisco Carlos & José-Antonio Moreiro-González

2021. La transformación digital en los métodos y temas de la investigación brasileña de Información y Documentación 2010-2019. Revista Española de Documentación Científica 44:2 ► pp. e293 ff.

Zhao, Ziyan, Li Zhang & Xiaoli Lian

2021. 2021 IEEE 29th International Requirements Engineering Conference (RE), ► pp. 24 ff.

Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever

2020. In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora. Language Resources and Evaluation 54:2 ► pp. 385 ff.

Pajić, Vesna, Staša Vujičić Stanković, Ranka Stanković & Miloš Pajić

2018. Semi-automatic extraction of multiword terms from domain-specific corpora. The Electronic Library 36:3 ► pp. 550 ff.

PERIÑAN-PASCUAL, CARLOS

2018. DEXTER: A workbench for automatic term extraction with specialized corpora. Natural Language Engineering 24:2 ► pp. 163 ff.

Oliver, Antoni

2017. A system for terminology extraction and translation equivalent detection in real time. Machine Translation 31:3 ► pp. 147 ff.

Heylen, Kris & Dirk De Hertog

2015. Automatic Term Extraction. In Handbook of Terminology [Handbook of Terminology, 1], ► pp. 203 ff.

Bernier-Colborne, Gabriel & Patrick Drouin

2014. Creating a test corpus for term extractors through term annotation. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 20:1 ► pp. 50 ff.

da Silva Conrado, Merley, Ariani Di Felippo, Thiago Alexandre Salgueiro Pardo & Solange Oliveira Rezende

2014. A survey of automatic term extraction for Brazilian Portuguese. Journal of the Brazilian Computer Society 20:1

Marín, María José

2014. Evaluation of five single-word term recognition methods on a legal English corpus. Corpora 9:1 ► pp. 83 ff.

Vivaldi, Jorge & Horacio Rodriguez

2014. 2014 Third IEEE International Colloquium in Information Science and Technology (CIST), ► pp. 248 ff.

Conrado, Merley S., Thiago A. S. Pardo & Solange O. Rezende

2013. Exploration of a Rich Feature Set for Automatic Term Extraction. In Advances in Artificial Intelligence and Its Applications [Lecture Notes in Computer Science, 8265], ► pp. 342 ff.

van der Plas, Lonneke, Jörg Tiedemann & Ismail Fahmi

2011. Automatic Extraction of Medical Term Variants from Multilingual Parallel Translations. In Interactive Multi-modal Question-Answering, ► pp. 149 ff.

[no author supplied]

2016. Pilot-Controller Communication Problems and an Initial Exploration of Language-Engineering Technologies as a Potential Solution. In Human Factors in Transportation [Industrial and Systems Engineering Series, ], ► pp. 297 ff.

[no author supplied]

2017. Term Variation in Specialised Corpora [Terminology and Lexicography Research and Practice, 19],

This list is based on CrossRef data as of 10 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.