The recognition and extraction of terms and their variants in texts are crucial processes in text mining. We use the ILC platform, an automatic controlled indexing platform, to perform these linguistic processes. We present a methodology for enhancing the recognition of syntactic term variation in English, using syntactic and morpho-syntactic features. Principal spurious variants of terms are ascribed to incorrect word dependencies. To overcome these problems, we consider each term variant as a window on the sentence and introduce two criteria: an internal syntactic criterion which checks that the dependencies between words in the window are respected, and an external criterion which defines boundaries, making it possible to ensure that the window is well positioned in the sentence. The use of these criteria improves filtering of the variants and assists the expert in validating the indexing.
2023. A Lightweight Statistical Method for Terminology Extraction. Journal of Computer-Assisted Linguistic Research 7 ► pp. 43 ff.
Kováříková, Dominika
2021. Machine Learning in Terminology Extraction from Czech and English Texts. Linguistic Frontiers 0:0
Kováříková, Dominika
2021. Machine Learning in Terminology Extraction from Czech and English Texts. Linguistic Frontiers 4:2 ► pp. 23 ff.
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
2021. HAMLET. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 27:2 ► pp. 254 ff.
Clouet, Elizaveta & Béatrice Daille
2014. Compound Terms and Their Multi-word Variants: Case of German and Russian Languages. In Computational Linguistics and Intelligent Text Processing [Lecture Notes in Computer Science, 8403], ► pp. 68 ff.
van der Plas, Lonneke, Jörg Tiedemann & Ismail Fahmi
2011. Automatic Extraction of Medical Term Variants from Multilingual Parallel Translations. In Interactive Multi-modal Question-Answering, ► pp. 149 ff.
This list is based on CrossRef data as of 27 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.