Creating a test corpus for term extractors through term annotation
In this paper, we describe a methodology used to create a test corpus for the evaluation of term extractors. This methodology relies on term annotation: terms in a corpus on automotive engineering are selected based on specific criteria pertaining to the terminological setting as well as linguistic and formal properties of terms and term variations. The test corpus accounts for the variety of ways in which terms are realized in running text, and provides a means of automatically evaluating the relevance of term candidate lists produced by term extractors. Due to the XML annotation scheme used, the corpus can be customized, e.g. by filtering out some of the annotated terms based on the type of term or term variation, or frequency. In this paper, we focus on the methodological aspects of this work.
Keywords: term extractor evaluation, corpus annotation, test corpus, term extraction, evaluation, term variation, terminological variation
Published online: 25 April 2014
Ahmad, Khurshid, Andrea Davies, Heather Fulford, and Margaret Rogers
Cabré, Maria-Teresa, Anne Condamines, and Fidelia Ibekwe-SanJuan
Carl, Michael, Ecaterina Rascu, Johann Haller, and Philippe Langlais
Carreño Cruz, Sahara I.
Cohen, K. Bretonnel, Lynne Fox, Philip V. Ogren, and Lawrence Hunter
Haralambous, Yannis, and Elisa Lavagnino
Kageura, Kyo, Masaharu Yoshioka, Koichi Takeuchi, Teruo Koyama, Keita Tsuji, and Fuyuki Yoshikane
Kano, Yoshinobu, William A. Baumgartner Jr., Luke McCrohon, Sophia Ananiadou, K. Bretonnel Cohen, Lawrence Hunter, and Jun'ichi Tsujii
Loginova, Elizaveta, Anita Gojun, Helena Blancafort, Marie Guégan, Tatiana Gornostay, and Ulrich Heid
2012 “Reference Lists for the Evaluation of Term Extraction Tools.” In Proceedings of the 10th Terminology and Knowledge Engineering Conference (TKE 2012) , 177–192. Madrid.
Nazarenko, Adeline, Haïfa Zargayouna, Olivier Hamon, and Jonathan van Puymbrouck
Timimi, Ismaïl, and Widad Mustafa El Hadi
2008 “CESART: une campagne d’évaluation de systèmes d’acquisition de ressources terminologiques [CESART: An Evaluation Campaign for Terminological Resource Acquisition Systems].” In L’évaluation des technologies en traitement de la langue: Les campagnes Technolangue [Evaluating Natural Language Processing Technologies: The Technolangue Campaigns], ed. by Stéphane Chaudiron, and Khalid Choukry, 71–91. Paris: Hermès.
Vivaldi, Jorge, and Horacio Rodríguez
Widlöcher, Antoine, and Yann Mathet
2009 “La plate-forme Glozz: environnement d’annotation et d’exploration de corpus.” [The Glozz Platform: A Corpus Annotation and Exploration Environment]. Proceedings of Traitement Automatique des Langues Naturelles (TALN) , 2009. Senlis (France).
Cited by other publications
Astrakhantsev, N. A., D. G. Fedorenko & D. Yu. Turdakov
Ljubešić, Nikola, Darja Fišer & Tomaž Erjavec
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
Zeng, Wen, Changqing Yao & Hui Li
This list is based on CrossRef data as of 22 november 2020. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.