The underpinnings of a composite measure for automatic term extraction
The case of SRC
The corpus-based identification of those lexical units which serve to describe a given specialized domain usually becomes a complex task, where an analysis oriented to the frequency of words and the likelihood of lexical associations is often ineffective. The goal of this article is to demonstrate that a user-adjustable composite metric such as SRC can accommodate to the diversity of domain-specific glossaries to be constructed from small- and medium-sized specialized corpora of non-structured texts. Unlike for most of the research in automatic term extraction, where single metrics are usually combined indiscriminately to produce the best results, SRC is grounded on the theoretical principles of salience, relevance and cohesion, which have been rationally implemented in the three components of this metric.
References
Ahmad, Khurshid, Lee Gillam, and Lena Tostevin
2000 “
Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER).” In
Proceedings of the 8th Text Retrieval Conference (TREC-8), ed. by
E.M. Voorhees, and
D.K. Harman, 717–724. Washington: National Institute of Standards and Technology.
Barrón-Cedeño, Alberto, Gerardo Sierra, Patrick Drouin, and Sophia Ananiadou
2009 “
An Improved Automatic Term Recognition Method for Spanish.” In
Computational Linguistics and Intelligent Text Processing, ed. by
Alexander Gelbukh, 125–136. Berlin-Heidelberg: Springer.
Barthes, Roland
1964 Elements of Semiology. New York: Hill and Wang.
Church, Kenneth Ward, and Patrick Hanks
1990 “
Word Association Norms, Mutual Information and Lexicography.”
Computational Linguistics 6 (1): 22–29.
Church, Kenneth Ward, William Gale, Patrick Hanks, and Donald Hindle
1991 “
Using Statistics in Lexical Analysis.” In
Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, ed. by
Uri Zernik, 115–164. Hillsdale, NJ: Lawrence Erlbaum.
Collins WordBanks Online
2013 (
[URL]). Accessed 14 August 2015.
Conrado, Merley da Silva, Ariani Felippo, Thiago Salgueiro Pardo, and Solange Rezende
2014 “
A Survey of Automatic Term Extraction for Brazilian Portuguese.”
Journal of the Brazilian Computer Society 20 (12): 1–28. (
[URL]). Accessed 14 August 2015.
Conrado, Merley da Silva, Thiago Salgueiro Pardo, and Solange Rezende
2014 “
The Main Challenge of Semi-Automatic Term Extraction Methods.” In
Proceedings of the 11th International Workshop on Natural Language Processing and Cognitive Science
, 1–10, Venice.
Cusin-Berche, Fabrienne
2003 Les mots et leurs contextes. Paris: Presses Sorbonne Nouvelle.
Dunning, Ted
1994 “
Accurate Methods for the Statistics of Surprise and Coincidence.”
Computational Linguistics 19 (1): 61–74.
Fedorenko, Denis, Nikita Astrakhantsev, and Denis Turdakov
2013 “
Automatic Recognition of Domain-Specific Terms: An Experimental Evaluation.” In
Proceedings of the 9th Spring Researcher’s Colloquium on Database and Information Systems
, 15–23, Kazan.
Frantzi, Katerina, and Sophia Ananiadou
1996 “
Extracting Nested Collocations.” In
Proceedings of the 16th International Conference on Computational Linguistics
, 41–46. Morristown: Association for Computational Linguistics.
Frantzi, Katerina, Sophia Ananiadou, and Mima Hideki
2000 “
Automatic Recognition of Multi-Word Terms: the C-Value/NC-Value Method.”
International Journal of Digital Libraries 3 (2): 115–130.
Golik, Wiktoria, Robert Bossy, Zorana Ratkovic, and Claire Nédellec
2013 “
Improving Term Extraction with Linguistic Analysis in the Biomedical Domain.”
Research in Computing Science 701: 157–172.
Graf, Rudolf F
1999 Modern Dictionary of Electronics, 7th edition. Boston: Newnes.
Grefenstette, Gregory
1994 Explorations in Automatic Thesaurus Discovery. Boston: Kluwer Academic.
Harris, Zellig
1954 “
Distributional Structure.”
Word 10 (23): 146–162.
Kageura, Kyo, and Bin Umino
Knoth, Petr, Marek Schmidt, Pavel Smrz, and Zdenek Zdráhal
2009 “
Towards a Framework for Comparing Automatic Term Recognition Methods.” In
Proceedings of the 8th Annual Conference Znalosti, 83–94. Bratislava: Informatics and Information Technology STU.
Korkontzelos, Ioannis, Ioannis Klapaftis, and Suresh Manandhar
2008 “
Reviewing and Evaluating Automatic Term Recognition Techniques.” In
Proceedings of the 6th International Conference on Advances in Natural Language Processing, ed. by
Bengt Nordström and
Aarne Ranta, 248–259. Berlin-Heidelberg: Springer.
Kraaij, Wessel, and Renée Pohlmann
1996 “
Viewing Stemming as Recall Enhancement.” In
Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 40–48, Zurich.
Lossio-Ventura, Juan Antonio, Clement Jonquet, Mathieu Roche, and Maguelonne Teisseire
2014 “
Biomedical Terminology Extraction: A New Combination of Statistical and Web Mining Approaches.” In
Proceedings of Journées Internationales d’Analyse Statistique des Données Textuelles
, 1–12, Paris.
Luhn, Hans Peter
1958 “
The Automatic Creation of Literature Abstracts”.
IBM Journal of Research and Development 2 (2): 159–165.
Mairal-Usón, Ricardo, and Carlos Periñán-Pascual
2009 “
The Anatomy of the Lexicon within the Framework of an NLP Knowledge Base.”
Revista Española de Lingüística Aplicada 221: 217–244.
Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze
2009 Introduction to Information Retrieval. Cambridge: Cambridge University Press.
Nagao, Makoto, Mikio Mizutani, and Hiroyuki Ikeda
1976 “
An Automated Method of the Extraction of Important Words from Japanese Scientific Documents.”
Transactions of Information Processing Society of Japan 17 (2): 110–117.
Navigli, Roberto, and Paola Velardi
2002 “
Semantic Interpretation of Terminological Strings.” In
Proceedings of the 6th International Conference on Terminology and Knowledge Engineering
, 95–100. Berlin-Heidelberg: Springer.
Park, Youngja, Roy J. Byrd, and Branimir K. Boguraev
2002 “
Automatic Glossary Extraction: Beyond Terminology Identification.” In
Proceedings of the 19th International Conference on Computational Linguistics
, vol. 11, 1–7. Stroudsburg, PA: Association for Computational Linguistics.
Pazienza, Maria Teresa, Marco Pennacchiotti, and Fabio Massimo Zanzotto
2005 “
Terminology Extraction: An Analysis of Linguistic and Statistical Approaches”. In
Studies in Fuzziness and Soft Computing: Knowledge Mining, ed. by
Janusz Kacprzyk and
Spiros Sirmakessis, 255–279. Berlin-Heidelberg: Springer.
Peñas, Anselmo, Felisa Verdejo, and Julio Gonzalo
2001 “
Corpus-Based Terminology Extraction Applied to Information Access.” In
Proceedings of the Corpus Linguistics Conference
, 458–465, Lancaster.
Periñán Pascual, Carlos
2013 “
A Knowledge-Engineering Approach to the Cognitive Categorization of Lexical Meaning.”
VIAL: Vigo International Journal of Applied Linguistics 101: 85–104.
Periñán-Pascual, Carlos, and Francisco Arcas-Túnez
2004 “
Meaning Postulates in a Lexico-Conceptual Knowledge Base.” In
Proceedings of the 15th International Workshop on Databases and Expert Systems Applications
, 38–42. Los Alamitos: the Institute of Electrical and Electronics Engineers-Computer Society.
Periñán-Pascual, Carlos, and Francisco Arcas-Túnez
2007 “
Cognitive Modules of an NLP Knowledge Base for Language Understanding.”
Procesamiento del Lenguaje Natural 391: 197–204.
Periñán-Pascual, Carlos, and Francisco Arcas-Túnez
2010 “
The Architecture of FunGramKB.” In
Proceedings of the 7th International Conference on Language Resources and Evaluation
, 2667–2674. Malta: ELRA.
Periñán-Pascual, Carlos, and Ricardo Mairal-Usón
2009 “
Bringing Role and Reference Grammar to Natural Language Understanding.”
Procesamiento del Lenguaje Natural 431: 265–273.
Plante, Pierre, and Lucie Dumas
1989 “
Le Dépouillement Terminologique Assisté par Ordinateur.”
Terminogramme 461: 24–28.
Real Academia Española
Corpus de Referencia del Español Actual (CREA). (
[URL]). Accessed 14 August 2015.
Sabbah, Yousef W., and Yousef Abuzir
2005 “
Automatic Term Extraction Using Statistical Techniques: A Comparative in-Depth Study & Applications.” In
Proceedings of the International Arab Conference on Information Technology ACIT 2005
, 1–7, Amman.
Salton, Gerard
(ed.) 1971 The SMART Retrieval System – Experiments in Automatic Document Retrieval. Englewood Cliffs, NJ: Prentice Hall.
Salton, Gerard, and Christopher Buckley
1988 “
Term-Weighting Approaches in Automatic Text Retrieval.”
Information Processing & Management 24 (5): 513–523.
Salton, Gerard, Anita Wong, and Chung-Shu Yang
1975 “
A Vector Space Model for Automatic Indexing.”
Communications of the ACM 18 (11): 613–620.
Salton, Gerard, and Chung-Shu Yang
1973 “
On the Specification of Term in Automatic Indexing.”
Journal of Documentation 29 (4): 351–372.
Salton, Gerard, Chung-Shu Yang, and Clement T. Yu
1975 “
A Theory of Term Importance in Automatic Text Analysis.”
Journal of the American Society for Information Science 26 (1): 33–44.
Sclano, Francesco, and Paola Velardi
2007 “
TermExtractor: A Web Application to Learn the Common Terminology of Interest Groups and Research Communities.” In
Proceedings of the 9th Conference on Terminology and Artificial Intelligence
, 1–10, Sophia Antinopolis.
Singhal, Amit
1997 Term Weighting Revisited. Ph.D. thesis. Ithaca, NY: Cornell University.
Singhal, Amit, Chris Buckley, and Mandar Mitra
1996 “
Pivoted Document Length Normalization.” In
Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 21–29. New York: ACM press
Singhal, Amit, Gerard Salton, and Chris Buckley
1996 “
Length Normalization in Degraded Text Collections.” In
Proceedings of the 5th Annual Symposium on Document Analysis and Information Retrieval
, 149–162. Las Vegas: University of Nevada.
Smadja, Frank
1993 “
Retrieving Collocations from Text: Xtract.”
Computational Linguistics 19 (1): 143–178.
Sun, Qinglan, Debora Shaw, and Charles H. Davis
1999 “
A Model for Estimating the Occurrence of Same-Frequency Words and the Boundary between High- and Low-Frequency Words in Texts”.
Journal of the American Society for Information Science 50 (3): 280–286.
The British National Corpus (BNC)
Oxford University Computing Services.
[URL]
Turney, Peter D., and Patrick Pantel
2010 “
From Frequency to Meaning: Vector Space Models of Semantics.”
Journal of Artificial Intelligence Research 371: 141–188.
Velardi, Paola, Michele Missikoff, and Roberto Basili
2001 “
Identification of Relevant Terms to Support the Construction of Domain Ontologies.” In
Proceedings of the Workshop on Human Language Technology and Knowledge Management
, 1–8. Morristown: Association for Computational Linguistics.
Wong, Wilson, Wei Liu, and Mohammed Bennamoun
2007 “
Determining Termhood for Learning Domain Ontologies Using Domain Prevalence and Tendency.” In
Proceedings of the 6th Australasian Conference on Data Mining
, 47–54, Gold Coast.
Wong, Wilson, Wei Liu, and Mohammed Bennamoun
2008 “
Determination of Unithood and Termhood for Term Recognition.” In
Handbook of Research on Text and Web Mining Technologies, ed. by
Min Song and
Yi-Fang Wu, 500–529. Hershey-New York: IGI Global.
Zhang, Ziqi, José Iria, Christopher Brewster, and Fabio Ciravegna
2008 “
A Comparative Evaluation of Term Recognition Algorithms.” In
Proceedings of the 6th International Conference on Language Resources and Evaluation
, 2108–2113. Marrakech: ELRA.
Cited by
Cited by 2 other publications
Felices Lago, Ángel M. & Pedro Ureña Gómez-Moreno
2020.
Conceptualización de entidades terminológicas en una subontología de derecho penal: análisis del concepto superordinado +DRUG_00 en FunGramKb.
Revista de Lingüística y Lenguas Aplicadas 15:1
► pp. 15 ff.
PERIÑAN-PASCUAL, CARLOS
2018.
DEXTER: A workbench for automatic term extraction with specialized corpora.
Natural Language Engineering 24:2
► pp. 163 ff.
This list is based on CrossRef data as of 8 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.