In this paper, we describe a methodology used to create a test corpus for the evaluation of term extractors. This methodology relies on term annotation: terms in a corpus on automotive engineering are selected based on specific criteria pertaining to the terminological setting as well as linguistic and formal properties of terms and term variations. The test corpus accounts for the variety of ways in which terms are realized in running text, and provides a means of automatically evaluating the relevance of term candidate lists produced by term extractors. Due to the XML annotation scheme used, the corpus can be customized, e.g. by filtering out some of the annotated terms based on the type of term or term variation, or frequency. In this paper, we focus on the methodological aspects of this work.
Ahmad, Khurshid, Andrea Davies, Heather Fulford, and Margaret Rogers. 1994. “What Is a Term? The Semi-Automatic Extraction of Terms from Text.” In Translation Studies: An Interdiscipline, ed. by Mary Snell-Hornby, Franz Pöchhacker, and Klaus Kaindl, 267–278. Amsterdam: John Benjamins.
Bernier-Colborne, Gabriel. 2012. Élaboration d’un corpus étalon pour l’évaluation d’extracteurs de termes [Creating a Test Corpus for the Evaluation of Term Extractors]. MA thesis, Université de Montréal.
Carreño Cruz, Sahara I. 2004. Analyse de la variation terminologique en corpus parallèle anglais-espagnol et de son incidence sur l’extraction de termes bilingue [Analysis of Term Variation in an English-Spanish Parallel Corpus and its Influence on Bilingual Term Extraction]. MA thesis, Université de Montréal.
Collet, Tanja. 1997. “La réduction des unités terminologiques complexes de type syntagmatique [The Reduction of Complex Terms].”Meta: journal des traducteurs 42 (1): 193–206.
Cohen, K. Bretonnel, Lynne Fox, Philip V. Ogren, and Lawrence Hunter. 2005. “Corpus Design for Biomedical Natural Language Processing.” In
Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
, 38–45. Association for Computational Linguistics.
Daille, Béatrice. 1996. “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In The Balancing Act: Combining Symbolic and Statistical Approaches to Language, ed. by Judith L. Klavans, and Philip Resnik, 49–66. Cambridge: MIT Press.
Haralambous, Yannis, and Elisa Lavagnino. 2011. “La réduction des termes complexes dans les langues de spécialité.” [The Reduction of Multi-word Terms in Specialized Languages]TAL 52 (1): 37–68.
Jacquemin, Christian. 2001. Spotting and Discovering Terms through Natural Language Processing. Cambridge: MIT Press.
Kano, Yoshinobu, William A. Baumgartner Jr., Luke McCrohon, Sophia Ananiadou, K. Bretonnel Cohen, Lawrence Hunter, and Jun'ichi Tsujii. 2009. “U-Compare: Share and Compare Text Mining Tools with UIMA.” Bioinformatics 25 (15): 1997–1998.
L’Homme, Marie-Claude. 2004. La terminologie: principes et techniques [Terminology: Principles and Techniques]. Montréal: Presses de l’Université de Montréal.
Loginova, Elizaveta, Anita Gojun, Helena Blancafort, Marie Guégan, Tatiana Gornostay, and Ulrich Heid. 2012. “Reference Lists for the Evaluation of Term Extraction Tools.” In
Proceedings of the 10th Terminology and Knowledge Engineering Conference (TKE 2012)
, 177–192. Madrid.
Love, Stacy. 2000. Benchmarking the Performance of Two Automated Term-Extraction Systems: LOGOS and ATAO. MA thesis, Université de Montréal.
Nazarenko, Adeline, Haïfa Zargayouna, Olivier Hamon, and Jonathan van Puymbrouck. 2009. “Évaluation des outils terminologiques: enjeux, difficultés et propositions [Evaluating Terminology Tools: Issues, Challenges and Proposals].”Traitement automatique des langues 50 (1): 257–281.
Pearson, Jennifer. 1998. Terms in Context. Amsterdam: John Benjamins.
Timimi, Ismaïl, and Widad Mustafa El Hadi. 2008. “CESART: une campagne d’évaluation de systèmes d’acquisition de ressources terminologiques [CESART: An Evaluation Campaign for Terminological Resource Acquisition Systems].” In L’évaluation des technologies en traitement de la langue: Les campagnes Technolangue [Evaluating Natural Language Processing Technologies: The Technolangue Campaigns], ed. by Stéphane Chaudiron, and Khalid Choukry, 71–91. Paris: Hermès.
Widlöcher, Antoine, and Yann Mathet. 2009. “La plate-forme Glozz: environnement d’annotation et d’exploration de corpus.” [The Glozz Platform: A Corpus Annotation and Exploration Environment].
Proceedings of Traitement Automatique des Langues Naturelles (TALN)
, 2009. Senlis (France).
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
2020. In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora. Language Resources and Evaluation 54:2 ► pp. 385 ff.
Ljubešić, Nikola, Darja Fišer & Tomaž Erjavec
2019. KAS-term: Extracting Slovene Terms from Doctoral Theses via Supervised Machine Learning. In Text, Speech, and Dialogue [Lecture Notes in Computer Science, 11697], ► pp. 115 ff.
Zeng, Wen, Changqing Yao & Hui Li
2017. The exploration of information extraction and analysis about science and technology policy in China. The Electronic Library 35:4 ► pp. 709 ff.
Astrakhantsev, N. A., D. G. Fedorenko & D. Yu. Turdakov
2015. Methods for automatic term recognition in domain-specific text collections: A survey. Programming and Computer Software 41:6 ► pp. 336 ff.
This list is based on CrossRef data as of 27 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.