Automatic Term Extraction

Kris HeylenDirk De Hertog
Table of contents

The general aim of Term Extraction (TE) is to identify the core vocabulary of a specialized domain. Traditional Manual Term Extraction (MTE) is carried out by a terminologist who lists potential Term Candidates (TC) and then consults with a domain expert to arrive at a final list of validated terms. However, in a rapidly changing world with an ever growing technical vocabulary, the manual maintenance, or in the case of new technological fields, the manual exploration, indexation and description of a domain’s core vocabulary is a labour-intensive enterprise. Automatic Term Extraction (ATE) is meant first and foremost as a computerized aid to alleviate this time-consuming task. For now, ATE concentrates on automating the preliminary identification of Term Candidates. In the long run, ATE might replace MTE completely.

Full-text access is restricted to subscribers. Log in to obtain additional credentials. For subscription information see Subscription & Price.

References

Ahmad, Khurshid, Lee Gillam, and Lena Tostevin
1999“Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER).” In The 8th Text Retrieval Conference, edited by Ellen Voorhees and Donna Harman, 717-724. Washington: National Institute of Standards and Technology.Google Scholar
Ananiadou, Sophia
1994 “A methodology for automatic term recognition.” In Proceedings of the 15th conference on Computational linguistics (COLING’94), 1034-1038. Kyoto, Japan.
Assadi, Houssem and Didier Bourigault
1996 “Acquisition et modélisation des connaissances à partir de textes: outils informatiques et éléments méthodologiques.” In Actes du 10ème congrès Reconnaissance des Formes et Intelligence Artificielle, 505-514. Rennes: Association Française pour la Cybernétique Economique et Technique.Google Scholar
Aubin, Sophie and Thierry Hamon
2006“Improving term extraction with terminological resources.” In Proceedings of the 5th international conference on Advances in Natural Language Processing, edited by Tapio Salakoski, Filip Ginter, Sampo Pyysalo and Tapio Pahikkala, 380-387. Berlin/Heidelberg: Springer-Verlag.Google Scholar
Baroni, Marco and Silvia Bernardini
2004“BootCaT: Bootstrapping Corpora and Terms from the Web.” In Proceedings of the Fourth International Conference On Language Resources And Evaluation, edited by Maria Teresa Lino et al., 1313-1316. Lisbon, Portugal: European Language Resources Association.Google Scholar
Basili, Roberto, Alessandro Moschitti, Fabio Massimo Zanzotto, Maria Teresa Pazienza, and Nicolas Nicolov and Ruslan Mitkov
2001 “Modelling Syntactic Context in Automatic Term Extraction.” In Proceedings of Recent Advances in Natural Language Processing, edited by 28-34. Amsterdam/Philadelphia: John Benjamins.Google Scholar
Biber, Douglas
1993“Representativeness in Corpus Design.” Literary and Linguistic Computing 8(4):243-257. DOI logoGoogle Scholar
Biber, Douglas and Susan Conrad
1999“Lexical bundles in conversation and academic prose.” Language and Computers 26:181-190.Google Scholar
Bourigault, Didier
1992“Surface grammatical analysis for the extraction of terminological noun phrases.” In Proceedings of 14th International Conference on Computational Linguistics, edited by Christian Boitet, 977-981. Stroudsburg, PA, USA: Association for Computational Linguistics. DOI logoGoogle Scholar
Bourigault, Didier and Christian Jacquemin
1999 “Term extraction + term clustering: An integrated platform for computer-aided terminology.” In Proceedings of the ninth conference on European Chapter of the Association for Computational Linguistics (EACL), Bergen, 15-22. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Cabré Castellví, M. Teresa, Rosa Estopà, and Jordi Vivaldi
2001“Automatic term detection: a review of current systems.” In Recent Advances in Computational Terminology, edited by Didier Bourigault, Christian Jacquemin and Marie-Claude L’Homme, 53-88. Natural Language Processing, vol. 2. Amsterdam: John Benjamins Publishing Company. DOI logo  TSBGoogle Scholar
Chung, Teresa Mihwa
2003“A corpus comparison approach for terminology extraction.” Terminology 9(26):221-246. DOI logoGoogle Scholar
Church, Kenneth and Patrick Hanks
1990“Word association norms, mutual information, and lexicography.” Computational Linguistics 16(1):22-29.Google Scholar
Da Silva, Joaquim, Gaël Dias, Sylvie Guilloré, and José Pereira Lopes
1999“Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units.” In Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, edited by Pedro Barahona and José Júlio Alferes, 113-132. London, UK: Springer-Verlag.Google Scholar
Daille, Béatrice
1994 “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Workshop at the 32nd Annual Meeting of the Association for Computational Linguistics, 29-36. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
1996“Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In The Balancing Act: Combining Symbolic and Statistical Approaches to Language, edited by Philip Resnik and Judith L. Klavans, 49-66. Cambridge, MA, USA: MIT Press.Google Scholar
2005“Variations and application-oriented terminology engineering.” Terminology 11(1):181-197. DOI logo  TSBGoogle Scholar
Daille, Béatrice, Eric Gaussier, and Jean-Marc Langé
1994 “Towards automatic extraction of monolingual and bilingual terminology.” In Proceedings of the 15th International Conference on Computational Linguistics, 515-521. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Drouin, Patrick
2003“Term extraction using non-technical corpora as a point of leverage.” Terminology 9(1):99-115. DOI logo  TSBGoogle Scholar
2006“Termhood: Quantifying the Relevance of a Candidate Term.” Linguistic Insights. Studies in Language and Communication 36:375-391.Google Scholar
Drouin, Patrick and Frédéric Doll
2008 “Quantifying Termhood Through Corpus Comparison”, In Terminology and Knowledge Engineering (TKE-2008), 191-206. Copenhagen, Denmark: Copenhagen Business School.Google Scholar
Dunning, Ted
1993“Accurate methods for the statistics of surprise and coincidence.” Computational Linguistics 19(1):61-74.Google Scholar
Evans, David, Natasa Milic-Frayling, and Robert Lefferts
1995 “Clarit TREC-4 Experiments.” In NIST Special Publication 500-236, edited by Donna Harman, 305-322.Google Scholar
Evert, Stefan
2004“The Statistics of Word Cooccurrences: Word Pairs and Collocations.” PhD diss., University of Stuttgart.Google Scholar
Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima
2000 “Automatic recognition of multi-word terms: The C-value/NC-value method.” International Journal on Digital Libraries 3(2):115-130. DOI logoGoogle Scholar
Foo, Jody
2012“Computational Terminology: Exploring Bilingual and Monolingual Term Extraction.” PhD diss., Linköping University.Google Scholar
Foo, Jody and Magnus Merkel
(2010) “Computer aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced editing tools.” In Terminology in Everyday Life, edited by Marcel Thelen and Frieda Steurs, 163-180. New York: John Benjamins. DOI logoGoogle Scholar
Groc, Clément de
2011“Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction.” In Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, edited by Olivier Boissier, Boualem Benatallah, Mike P. Papazoglou, Zbigniew W. Ras and Mohand-Said Hacid, 497-498. IEEE Computer Society.Google Scholar
Justeson, John S. and Slava M. Katz
1995 “Technical terminology: some linguistic properties and an algorithm for identification in text”. Natural Language Engineering 1(1):9-27. DOI logoGoogle Scholar
Kageura, Kyo
2009 “Computing the potential lexical productivity of head elements in nominal compounds using the textual corpus”. Progress in Informatics, (6):49-56. DOI logoGoogle Scholar
Kageura, Kyo and Umino, Bin
1996“Methods of automatic term recognition: a review”. Terminology 3(2):259-289. DOI logo  TSBGoogle Scholar
Kit, Chunyu
2002 “Corpus tools for retrieving and deriving termhood evidence.” In 5th East Asia Forum of Terminology, 69-80. Haikou, China.
Kit, Chunyu and Xiauyue Lui
2008“Measuring mono-word termhood by rank difference via corpus comparison.” Terminology 14(2):204-229. DOI logoGoogle Scholar
Korkontzelos, Ioannis, Ioannis Klapaftis, and Suresh Manandhar
2008“Reviewing and Evaluating Automatic Term Recognition Techniques.” In Proceedings of the 6th International Conference on Natural Language Processing, edited by Bengt Nordström and Aarne Ranta, 248-259. Berlin/Heidelberg, Germany: Springer.Google Scholar
Liu, Xiaoyue and Chunyu Kit
2009 “Statistical termhood measurement for mono-word terms via corpus comparison.” In Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, 3499-3504. IEEE Computer Society.Google Scholar
Manning, Christopher and Hinrich Schütze
1999Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press.Google Scholar
Matsuo, Yutaka and Mitsuru Ishizuka
2004“Keyword extraction from a single document using word co-occurrence statistical information.” International Journal on Artificial Intelligence Tools 13(1):157-169. DOI logoGoogle Scholar
Maynard, Diana and Sophia Ananiadou
1999“Identifying Contextual Information for Multi-Word Term Extraction.” In Proceedings of the TKE ‘99 International Congress on Terminology and Knowledge Engineering, edited by Peter Sandrini, 212-221. Vienna, Austria: TermNet.Google Scholar
McEnery, Tony, Richard Xiao, and Yukio Tono
editors 2006Corpus-based Language Studies: An Advanced Resource Book. London, UK: Routledge.Google Scholar
Medelyan, Olena and Ian H. Witten
2006“Thesaurus based automatic keyphrase indexing.” In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, edited by Gary Marchionini, Michael L. Nelson and Catherine C. Marshall, 296-297. New York, USA: Association for Computer Machinery. DOI logoGoogle Scholar
Nakagawa, Hiroshi
2000“Automatic Term Recognition based on Statistics of Compound Nouns.” Terminology 6(2):195-210. DOI logo  TSBGoogle Scholar
Nakagawa, Hiroshi and Tatsunori Mori
1998“Nested collocation and compound noun for term recognition.” InProceedings of the First Workshop on Computational Terminology, edited by Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 64-70. Montreal, Canada: Université de Montréal.Google Scholar
2002 “A simple but powerful automatic term extraction method.” In Proceedings of the Second International Workshop on Computational Terminology, 1-7. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Nenadic, Goran, Sophia Ananiadou, and John McNaught
2004 “Enhancing automatic term recognition through recognition of variation.” In Proceedings of the 20th international Conference on Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Pantel, Patrick and Lin, Dekang
2001“A Statistical Corpus-Based Term Extractor”. In Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of intelligence: Advances in Artificial intelligence, edited by Eleni Stroulia and Stan Matwin, 36-46. Lecture Notes In Computer Science, vol. 2056. London: Springer-Verlag.Google Scholar
Pazienza, Maria Teresa, Marco Pennacchiotti, and Fabio Massimo Zanzotto
2005“Terminology extraction: an analysis of linguistic and statistical approaches.” In Knowledge Mining, edited by Spiros Sirmakessis. Series: Studies in Fuzziness and Soft Computing, Vol.185. Springer-Verlag.Google Scholar
Pecina, Pavel and Pavel Schlesinger
2006 “Combining association measures for collocation extraction.” In Proceedings of the COLING/ACL on Main Conference Poster Sessions Annual Meeting of the ACL, 651-658. Morristown, NJ: Association for Computational Linguistics.Google Scholar
Rizzo, Camino R
2010“Getting on with corpus compilation: from theory to practice.” English for Specific Purposes World, Issue 1(27), vol. 9. http://​www​.esp​-world​.info.Google Scholar
Sager, Juan C
1978Commentary by Prof. Juan Carlos Sager. In Actes Table Ronde sur les Problèmes du Découpage du Terme, edited by G. Rondeau, 39-74. Montréal: Commission de Terminologie de l’AILA.Google Scholar
Salton, Gerard, Andrew Wong, and Chung-Su Yang
1975 “A vector space model for automatic indexing.” Communications of the ACM 18:613-620. DOI logoGoogle Scholar
Sclano, Francesco, Paola Velardi
2007 “Termextractor: a web application to learn the common terminology of interest groups and research communities.” In Proceedings of the 7th Conference on Terminology and Artificial Intelligence (TIA-2007), Sophia Antipolis.Google Scholar
Scott, Mike
1997“The Right Word in the Right Place: Key Word Associates in Two Languages.” AAA - Arbeiten aus Anglistik und Amerikanistik, 22 (2):239-252.Google Scholar
Simpson-Vlach, Rita and Nick Ellis
2010“An Academic Formulas List: New Methods in Phraseology Research.” Applied Linguistics 31:487-512. DOI logo  BoPGoogle Scholar
Thurmair, Gregor
2003 “Making Term Extraction Tools Usable.” In Proceedings of the Joint Conference of the 8th Workshop of the European Association for Machine Translation and the 4th Controlled Language Applications Workshop. Dublin: European Association for Machine Translation.Google Scholar
Vivaldi, Jordi and Horacio Rodriguez
2007“Evaluation of terms and term extraction systems - A practical approach.” Terminology 13(2):225-248. DOI logo  TSBGoogle Scholar
Vivaldi, Jordi, Lluis Màrquez, and Horacio Rodríguez
2001“Improving Term Extraction by System Combination Using Boosting.” In Machine Learning ECML 2001, edited by Luc de Raedt and Peter Flach, 515-526. Series: Lecture Notes in Computer Science, vol. 2167. Springer.Google Scholar
Wermter, Joachim and Udo Hahn
2005 “Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms.” In Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing, 843-850. Association for Computational Linguistics.Google Scholar
Wiechmann, Daniel
2008“On the Computation of Collostruction Strength: Testing Measures of Association as Expressions of Lexical Bias.” Corpus Linguistics and Linguistic Theory 4 (2):253-290. DOI logoGoogle Scholar
Wong, Wilson, Wei Liu, and Mohammed Bennamoun
2007 “Determining termhood for learning domain ontologies using domain prevalence and tendency.” In Proceedings of the Sixth Australasian Conference on Data Mining and Analytics, edited by Peter Christen, Paul Kennedy, Jiuyong Li, Inna Kolyshkina and Graham Williams, 47-54. Australian Computer Society.Google Scholar
Zhang, Ziqi, José Iria, Christopher Brewster, and Fabio Ciravegna
2008 “A Comparative Evaluation of Term Recognition Algorithms.” In Proceedings of the Sixth Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco.