Corpus-based extension of a terminological semantic lexicon
Benoît Habert | UMR 8503–École Normale Supérieure de Fontenay St Cloud
J. Bouaud | UMR 8503–École Normale Supérieure de Fontenay St Cloud
This paper addresses the problem of extending and tuning a terminological semantic lexicon to new domains and corpora. We argue that by relying on both a sublanguage corpus and a core semantic lexicon, it is possible to give an adequate description of the words that occur in the corpus. Our tuning method explores the corpus and gathers words that are likely to have similar meanings on the basis of their dependency relationships in the corpus. The aim of the present work is to assess the potential for classifying words based on the semantic categories of “neighbors”. The tagging procedure is tested and parameterized on a rather small French corpus dealing with coronary diseases (85,000 word units). This method is systematically evaluated by creating and categorizing artificial unknown words. Although word semantic categorization cannot be fully automated, the results show that our tagging procedure is a valuable help to account for new words and new word uses in a sublanguage.
Cited by (3)
Cited by three other publications
Grabar, Natalia & Thierry Hamon
2006.
Terminology Structuring Through the Derivational Morphology. In
Advances in Natural Language Processing [
Lecture Notes in Computer Science, 4139],
► pp. 652 ff.
This list is based on CrossRef data as of 11 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.