In this article, we present an approach to the automatic discovery of term similarities, which may serve as a basis for a number of term-oriented knowledge mining tasks. The method for term comparison combines internal (lexical similarity) and two types of external criteria (syntactic and contextual similarities). Lexical similarity is based on sharing lexical constituents (i.e. term heads and modifiers). Syntactic similarity relies on a set of specific lexico-syntactic co-occurrence patterns indicating the parallel usage of terms (e.g., within an enumeration or within a term coordination/conjunction structure), while contextual similarity is based on the usage of terms in similar contexts. Such contexts are automatically identified by a pattern mining approach, and a procedure is proposed to assess their domain-specific and terminological relevance. Although automatically collected, these patterns are domain dependent and identify contexts in which terms are used. Different types of similarities are combined into a hybrid similarity measure, which can be tuned for a specific domain by learning optimal weights for individual similarities. The suggested similarity measure has been tested in the domain of biomedicine, and some experiments are presented.
Drymonas, Efthymios, Kalliopi Zervanou & Euripides G. M. Petrakis
2008. The TSRM Approach in the Document Retrieval Application. In Natural Language and Information Systems [Lecture Notes in Computer Science, 5039], ► pp. 333 ff.
Drymonas, Euthymios, Kalliopi Zervanou & Euripides G. M. Petrakis
2010. Unsupervised Ontology Acquisition from Plain Texts: The OntoGain System. In Natural Language Processing and Information Systems [Lecture Notes in Computer Science, 6177], ► pp. 277 ff.
Nenadić, Goran & Sophia Ananiadou
2006. Mining semantically related terms from biomedical literature. ACM Transactions on Asian Language Information Processing 5:1 ► pp. 22 ff.
Oirrak, A. El & D. Aboutajdine
2010. 2010 5th International Symposium On I/V Communications and Mobile Network, ► pp. 1 ff.
SanJuan, Eric
2005. Query Refinement Through Lexical Clustering of Scientific Textual Databases. In Natural Language Processing and Information Systems [Lecture Notes in Computer Science, 3513], ► pp. 251 ff.
SanJuan, Eric, James Dowdall, Fidelia Ibekwe-SanJuan & Fabio Rinaldi
2005. A symbolic approach to automatic multiword term structuring. Computer Speech & Language 19:4 ► pp. 524 ff.
Song, Min, Hwanjo Yu & Wook-Shin Han
2015. Developing a hybrid dictionary-based bio-entity recognition technique. BMC Medical Informatics and Decision Making 15:S1
Spasic, Irena
2018. Acronyms as an Integral Part of Multi-Word Term Recognition – A Token of Appreciation. IEEE Access 6 ► pp. 8351 ff.
Spasić, Irena & Sophia Ananiadou
2004. Using automatically learnt verb selectional preferences for classification of biomedical terms. Journal of Biomedical Informatics 37:6 ► pp. 483 ff.
Spasić, Irena, Mark Greenwood, Alun Preece, Nick Francis & Glyn Elwyn
2013. FlexiTerm: a flexible term recognition method. Journal of Biomedical Semantics 4:1 ► pp. 27 ff.
Spasić, Irena, Goran Nenadić & Sophia Ananiadou
2004. Learning to Classify Biomedical Terms Through Literature Mining and Genetic Algorithms. In Intelligent Data Engineering and Automated Learning – IDEAL 2004 [Lecture Notes in Computer Science, 3177], ► pp. 345 ff.
Zan, Hongying, Guocheng Duan & Ming Fan
2007. Third International Conference on Natural Computation (ICNC 2007) Vol V, ► pp. 451 ff.
This list is based on CrossRef data as of 8 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.