Mining term similarities from corpora

Nenadic, Goran; Spasic, Irena; Ananiadou, Sophia

doi:10.1075/term.10.1.04nen

Article published In:

Recent Trends in Computational Terminology
Edited by Béatrice Daille, Kyo Kageura, Hiroshi Nakagawa and Lee-Feng Chien
[Terminology 10:1] 2004
► pp. 55–80

Mining term similarities from corpora

Goran Nenadic

Irena Spasic

Sophia Ananiadou

In this article, we present an approach to the automatic discovery of term similarities, which may serve as a basis for a number of term-oriented knowledge mining tasks. The method for term comparison combines internal (lexical similarity) and two types of external criteria (syntactic and contextual similarities). Lexical similarity is based on sharing lexical constituents (i.e. term heads and modifiers). Syntactic similarity relies on a set of specific lexico-syntactic co-occurrence patterns indicating the parallel usage of terms (e.g., within an enumeration or within a term coordination/conjunction structure), while contextual similarity is based on the usage of terms in similar contexts. Such contexts are automatically identified by a pattern mining approach, and a procedure is proposed to assess their domain-specific and terminological relevance. Although automatically collected, these patterns are domain dependent and identify contexts in which terms are used. Different types of similarities are combined into a hybrid similarity measure, which can be tuned for a specific domain by learning optimal weights for individual similarities. The suggested similarity measure has been tested in the domain of biomedicine, and some experiments are presented.

Keywords: automatic terminology management, term similarity, contextual similarity, pattern mining, term clustering

Published online: 10 June 2004

https://doi.org/10.1075/term.10.1.04nen

Cited by (12)

Cited by 12 other publications

Order by:

Spasic, Irena

2018. Acronyms as an Integral Part of Multi-Word Term Recognition – A Token of Appreciation. IEEE Access 6 ► pp. 8351 ff.

Song, Min, Hwanjo Yu & Wook-Shin Han

2015. Developing a hybrid dictionary-based bio-entity recognition technique. BMC Medical Informatics and Decision Making 15:S1

Spasić, Irena, Mark Greenwood, Alun Preece, Nick Francis & Glyn Elwyn

2013. FlexiTerm: a flexible term recognition method. Journal of Biomedical Semantics 4:1 ► pp. 27 ff.

Drymonas, Euthymios, Kalliopi Zervanou & Euripides G. M. Petrakis

2010. Unsupervised Ontology Acquisition from Plain Texts: The OntoGain System. In Natural Language Processing and Information Systems [Lecture Notes in Computer Science, 6177], ► pp. 277 ff.

Oirrak, A. El & D. Aboutajdine

2010. 2010 5th International Symposium On I/V Communications and Mobile Network, ► pp. 1 ff.

Drymonas, Efthymios, Kalliopi Zervanou & Euripides G. M. Petrakis

2008. The TSRM Approach in the Document Retrieval Application. In Natural Language and Information Systems [Lecture Notes in Computer Science, 5039], ► pp. 333 ff.

Zan, Hongying, Guocheng Duan & Ming Fan

2007. Third International Conference on Natural Computation (ICNC 2007) Vol V, ► pp. 451 ff.

Nenadić, Goran & Sophia Ananiadou

2006. Mining semantically related terms from biomedical literature. ACM Transactions on Asian Language Information Processing 5:1 ► pp. 22 ff.

SanJuan, Eric

2005. Query Refinement Through Lexical Clustering of Scientific Textual Databases. In Natural Language Processing and Information Systems [Lecture Notes in Computer Science, 3513], ► pp. 251 ff.

SanJuan, Eric, James Dowdall, Fidelia Ibekwe-SanJuan & Fabio Rinaldi

2005. A symbolic approach to automatic multiword term structuring. Computer Speech & Language 19:4 ► pp. 524 ff.

Spasić, Irena & Sophia Ananiadou

2004. Using automatically learnt verb selectional preferences for classification of biomedical terms. Journal of Biomedical Informatics 37:6 ► pp. 483 ff.

Spasić, Irena, Goran Nenadić & Sophia Ananiadou

2004. Learning to Classify Biomedical Terms Through Literature Mining and Genetic Algorithms. In Intelligent Data Engineering and Automated Learning – IDEAL 2004 [Lecture Notes in Computer Science, 3177], ► pp. 345 ff.

This list is based on CrossRef data as of 10 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.