Creating a test corpus for term extractors through term annotation
In this paper, we describe a methodology used to create a test corpus for the evaluation of term extractors. This methodology relies on term annotation: terms in a corpus on automotive engineering are selected based on specific criteria pertaining to the terminological setting as well as linguistic and formal properties of terms and term variations. The test corpus accounts for the variety of ways in which terms are realized in running text, and provides a means of automatically evaluating the relevance of term candidate lists produced by term extractors. Due to the XML annotation scheme used, the corpus can be customized, e.g. by filtering out some of the annotated terms based on the type of term or term variation, or frequency. In this paper, we focus on the methodological aspects of this work.
Cited by other publications
This list is based on CrossRef data as of 24 august 2020. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.