Improving term extraction by combining different techniques

Vivaldi, Jorge; Rodríguez, Horacio

doi:10.1075/term.7.1.04viv

Article published In:

Terminology
Vol. 7:1 (2001) ► pp.31–48

Improving term extraction by combining different techniques

Jorge Vivaldi | Universitat Pompeu Fabra

Horacio Rodríguez | Universitat Politècnica de Catalunya

Two different reasons suggest that combining the performance of several term extractors could lead to an improvement in overall system accuracy. On the one hand, there is no clear agreement on whether to follow statistical, linguistic or hybrid approaches for (semi-) automatic term extraction. On the other hand, combining different knowledge sources (e.g. classifiers) has proved successful in improving the performance of individual sources on several NLP tasks (some of them closely related to or involved in term extraction), such as context-sensitive spelling correction, part-of-speech tagging, word sense disambiguation, parsing, text classification and filtering, etc.

In this paper, we present a proposal for combining a number of different term extraction techniques in order to improve the accuracy of the resulting system. The approach has been applied to the domain of medicine for the Spanish language. A number of tests have been carried out with encouraging results.

Keywords: term extraction, semantic data, medicine., statistics

Published online: 7 December 2001

https://doi.org/10.1075/term.7.1.04viv

Cited by

Cited by 13 other publications

Order by:

AlMahmoud, Rana Husni & Bassam H. Hammo

2024. SEWAR: A corpus-based N-gram approach for extracting semantically-related words from Arabic medical corpus. Expert Systems with Applications 238 ► pp. 121767 ff.

Chung, Teresa Mihwa & Paul Nation

2004. Identifying technical vocabulary. System 32:2 ► pp. 251 ff.

Drouin, Patrick

2016. Acquisition automatique de termes : simuler le travail du terminologue. Éla. Études de linguistique appliquée N° 180:4 ► pp. 417 ff.

Gamallo, Pablo & Marcos Garcia

2016. Entity Linking with Distributional Semantics. In Computational Processing of the Portuguese Language [Lecture Notes in Computer Science, 9727], ► pp. 177 ff.

Gillam, Lee & Khurshid Ahmad

2005. Pattern Mining Across Domain-Specific Text Collections. In Machine Learning and Data Mining in Pattern Recognition [Lecture Notes in Computer Science, 3587], ► pp. 570 ff.

Ittoo, Ashwin & Gosse Bouma

2013. Term extraction from sparse, ungrammatical domain-specific documents. Expert Systems with Applications 40:7 ► pp. 2530 ff.

Ittoo, Ashwin, Laura Maruster, Hans Wortmann & Gosse Bouma

2010. Textractor: A Framework for Extracting Relevant Domain Concepts from Irregular Corporate Textual Datasets. In Business Information Systems [Lecture Notes in Business Information Processing, 47], ► pp. 71 ff.

Ren, Feiliang

2014. An unsupervised cascade learning scheme for ‘cluster-theme keywords’ structure extraction from scientific papers. Journal of Information Science 40:2 ► pp. 167 ff.

Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever

2021. HAMLET. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 27:2 ► pp. 254 ff.

Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever

2022. Tagging terms in text. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28:1 ► pp. 157 ff.

Vivaldi, Jorge, Iria da Cunha & Javier Ramírez

2011. The REG Summarization System with Question Reformulation at QA

INEX Track 2010. In Comparative Evaluation of Focused Retrieval [Lecture Notes in Computer Science, 6932], ► pp. 295 ff.

Vàzquez, Mercè & Antoni Oliver

2022. Improving term candidates selection using terminological tokens. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication ► pp. 122 ff.

[no author supplied]

2014. Bibliography. In Automatic Text Summarization, ► pp. 309 ff.

This list is based on CrossRef data as of 9 june 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.