Automatic Term Extraction

Kris Heylen & Dirk De Hertog

Table of contents

References
Related articles

The general aim of Term Extraction (TE) is to identify the core vocabulary of a specialized domain. Traditional Manual Term Extraction (MTE) is carried out by a terminologist who lists potential Term Candidates (TC) and then consults with a domain expert to arrive at a final list of validated terms. However, in a rapidly changing world with an ever growing technical vocabulary, the manual maintenance, or in the case of new technological fields, the manual exploration, indexation and description of a domain’s core vocabulary is a labour-intensive enterprise. Automatic Term Extraction (ATE) is meant first and foremost as a computerized aid to alleviate this time-consuming task. For now, ATE concentrates on automating the preliminary identification of Term Candidates. In the long run, ATE might replace MTE completely.

References

Ahmad, Khurshid, Lee Gillam, and Lena Tostevin

1999 “Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER).” In The 8th Text Retrieval Conference, edited by Ellen Voorhees and Donna Harman, 717-724. Washington: National Institute of Standards and Technology.

Ananiadou, Sophia

1994 “A methodology for automatic term recognition.” In Proceedings of the 15th conference on Computational linguistics (COLING’94), 1034-1038. Kyoto, Japan.

Assadi, Houssem and Didier Bourigault

1996 “Acquisition et modélisation des connaissances à partir de textes: outils informatiques et éléments méthodologiques.” In Actes du 10ème congrès Reconnaissance des Formes et Intelligence Artificielle, 505-514. Rennes: Association Française pour la Cybernétique Economique et Technique.

Aubin, Sophie and Thierry Hamon

2006 “Improving term extraction with terminological resources.” In Proceedings of the 5th international conference on Advances in Natural Language Processing, edited by Tapio Salakoski, Filip Ginter, Sampo Pyysalo and Tapio Pahikkala, 380-387. Berlin/Heidelberg: Springer-Verlag.

Baroni, Marco and Silvia Bernardini

2004 “BootCaT: Bootstrapping Corpora and Terms from the Web.” In Proceedings of the Fourth International Conference On Language Resources And Evaluation, edited by Maria Teresa Lino et al., 1313-1316. Lisbon, Portugal: European Language Resources Association.

Basili, Roberto, Alessandro Moschitti, Fabio Massimo Zanzotto, Maria Teresa Pazienza, and Nicolas Nicolov and Ruslan Mitkov

2001 “Modelling Syntactic Context in Automatic Term Extraction.” In Proceedings of Recent Advances in Natural Language Processing, edited by 28-34. Amsterdam/Philadelphia: John Benjamins.

Biber, Douglas

1993 “Representativeness in Corpus Design.” Literary and Linguistic Computing 8(4):243-257.

Biber, Douglas and Susan Conrad

1999 “Lexical bundles in conversation and academic prose.” Language and Computers 26:181-190.

Bourigault, Didier

1992 “Surface grammatical analysis for the extraction of terminological noun phrases.” In Proceedings of 14th International Conference on Computational Linguistics, edited by Christian Boitet, 977-981. Stroudsburg, PA, USA: Association for Computational Linguistics.

Bourigault, Didier and Christian Jacquemin

1999 “Term extraction + term clustering: An integrated platform for computer-aided terminology.” In Proceedings of the ninth conference on European Chapter of the Association for Computational Linguistics (EACL), Bergen, 15-22. Stroudsburg, PA, USA: Association for Computational Linguistics.

Cabré Castellví, M. Teresa, Rosa Estopà, and Jordi Vivaldi

2001 “Automatic term detection: a review of current systems.” In Recent Advances in Computational Terminology, edited by Didier Bourigault, Christian Jacquemin and Marie-Claude L’Homme, 53-88. Natural Language Processing, vol. 2. Amsterdam: John Benjamins Publishing Company. TSB

Chung, Teresa Mihwa

2003 “A corpus comparison approach for terminology extraction.” Terminology 9(26):221-246.

Church, Kenneth and Patrick Hanks

1990 “Word association norms, mutual information, and lexicography.” Computational Linguistics 16(1):22-29.

Da Silva, Joaquim, Gaël Dias, Sylvie Guilloré, and José Pereira Lopes

1999 “Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units.” In Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, edited by Pedro Barahona and José Júlio Alferes, 113-132. London, UK: Springer-Verlag.

Daille, Béatrice

1994 “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Workshop at the 32nd Annual Meeting of the Association for Computational Linguistics, 29-36. Stroudsburg, PA, USA: Association for Computational Linguistics.

1996 “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In The Balancing Act: Combining Symbolic and Statistical Approaches to Language, edited by Philip Resnik and Judith L. Klavans, 49-66. Cambridge, MA, USA: MIT Press.

2005 “Variations and application-oriented terminology engineering.” Terminology 11(1):181-197. TSB

Daille, Béatrice, Eric Gaussier, and Jean-Marc Langé

1994 “Towards automatic extraction of monolingual and bilingual terminology.” In Proceedings of the 15th International Conference on Computational Linguistics, 515-521. Stroudsburg, PA, USA: Association for Computational Linguistics.

Drouin, Patrick

2003 “Term extraction using non-technical corpora as a point of leverage.” Terminology 9(1):99-115. TSB

2006 “Termhood: Quantifying the Relevance of a Candidate Term.” Linguistic Insights. Studies in Language and Communication 36:375-391.

Drouin, Patrick and Frédéric Doll

2008 “Quantifying Termhood Through Corpus Comparison”, In Terminology and Knowledge Engineering (TKE-2008), 191-206. Copenhagen, Denmark: Copenhagen Business School.

Dunning, Ted

1993 “Accurate methods for the statistics of surprise and coincidence.” Computational Linguistics 19(1):61-74.

Evans, David, Natasa Milic-Frayling, and Robert Lefferts

1995 “Clarit TREC-4 Experiments.” In NIST Special Publication 500-236, edited by Donna Harman, 305-322.

Evert, Stefan

2004 “The Statistics of Word Cooccurrences: Word Pairs and Collocations.” PhD diss., University of Stuttgart.

Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima

2000 “Automatic recognition of multi-word terms: The C-value/NC-value method.” International Journal on Digital Libraries 3(2):115-130.

Foo, Jody

2012 “Computational Terminology: Exploring Bilingual and Monolingual Term Extraction.” PhD diss., Linköping University.

Foo, Jody and Magnus Merkel

(2010) “Computer aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced editing tools.” In Terminology in Everyday Life, edited by Marcel Thelen and Frieda Steurs, 163-180. New York: John Benjamins.

Groc, Clément de

2011 “Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction.” In Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, edited by Olivier Boissier, Boualem Benatallah, Mike P. Papazoglou, Zbigniew W. Ras and Mohand-Said Hacid, 497-498. IEEE Computer Society.

Justeson, John S. and Slava M. Katz

1995 “Technical terminology: some linguistic properties and an algorithm for identification in text”. Natural Language Engineering 1(1):9-27.

Kageura, Kyo

2009 “Computing the potential lexical productivity of head elements in nominal compounds using the textual corpus”. Progress in Informatics, (6):49-56.

Kageura, Kyo and Umino, Bin

1996 “Methods of automatic term recognition: a review”. Terminology 3(2):259-289. TSB

Kit, Chunyu

2002 “Corpus tools for retrieving and deriving termhood evidence.” In 5th East Asia Forum of Terminology, 69-80. Haikou, China.

Kit, Chunyu and Xiauyue Lui

2008 “Measuring mono-word termhood by rank difference via corpus comparison.” Terminology 14(2):204-229.

Korkontzelos, Ioannis, Ioannis Klapaftis, and Suresh Manandhar

2008 “Reviewing and Evaluating Automatic Term Recognition Techniques.” In Proceedings of the 6th International Conference on Natural Language Processing, edited by Bengt Nordström and Aarne Ranta, 248-259. Berlin/Heidelberg, Germany: Springer.

Liu, Xiaoyue and Chunyu Kit

2009 “Statistical termhood measurement for mono-word terms via corpus comparison.” In Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, 3499-3504. IEEE Computer Society.

Manning, Christopher and Hinrich Schütze

1999 Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press.

Matsuo, Yutaka and Mitsuru Ishizuka

2004 “Keyword extraction from a single document using word co-occurrence statistical information.” International Journal on Artificial Intelligence Tools 13(1):157-169.

Maynard, Diana and Sophia Ananiadou

1999 “Identifying Contextual Information for Multi-Word Term Extraction.” In Proceedings of the TKE ‘99 International Congress on Terminology and Knowledge Engineering, edited by Peter Sandrini, 212-221. Vienna, Austria: TermNet.

McEnery, Tony, Richard Xiao, and Yukio Tono

editors 2006 Corpus-based Language Studies: An Advanced Resource Book. London, UK: Routledge.

Medelyan, Olena and Ian H. Witten

2006 “Thesaurus based automatic keyphrase indexing.” In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, edited by Gary Marchionini, Michael L. Nelson and Catherine C. Marshall, 296-297. New York, USA: Association for Computer Machinery.

Nakagawa, Hiroshi

2000 “Automatic Term Recognition based on Statistics of Compound Nouns.” Terminology 6(2):195-210. TSB

Nakagawa, Hiroshi and Tatsunori Mori

1998 “Nested collocation and compound noun for term recognition.” InProceedings of the First Workshop on Computational Terminology, edited by Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 64-70. Montreal, Canada: Université de Montréal.

2002 “A simple but powerful automatic term extraction method.” In Proceedings of the Second International Workshop on Computational Terminology, 1-7. Stroudsburg, PA, USA: Association for Computational Linguistics.

Nenadic, Goran, Sophia Ananiadou, and John McNaught

2004 “Enhancing automatic term recognition through recognition of variation.” In Proceedings of the 20th international Conference on Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics.

Pantel, Patrick and Lin, Dekang

2001 “A Statistical Corpus-Based Term Extractor”. In Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of intelligence: Advances in Artificial intelligence, edited by Eleni Stroulia and Stan Matwin, 36-46. Lecture Notes In Computer Science, vol. 2056. London: Springer-Verlag.

Pazienza, Maria Teresa, Marco Pennacchiotti, and Fabio Massimo Zanzotto

2005 “Terminology extraction: an analysis of linguistic and statistical approaches.” In Knowledge Mining, edited by Spiros Sirmakessis. Series: Studies in Fuzziness and Soft Computing, Vol.185. Springer-Verlag.

Pecina, Pavel and Pavel Schlesinger

2006 “Combining association measures for collocation extraction.” In Proceedings of the COLING/ACL on Main Conference Poster Sessions Annual Meeting of the ACL, 651-658. Morristown, NJ: Association for Computational Linguistics.

Rizzo, Camino R

2010 “Getting on with corpus compilation: from theory to practice.” English for Specific Purposes World, Issue 1(27), vol. 9. http://www.esp-world.info.

Sager, Juan C

1978 Commentary by Prof. Juan Carlos Sager. In Actes Table Ronde sur les Problèmes du Découpage du Terme, edited by G. Rondeau, 39-74. Montréal: Commission de Terminologie de l’AILA.

Salton, Gerard, Andrew Wong, and Chung-Su Yang

1975 “A vector space model for automatic indexing.” Communications of the ACM 18:613-620.

Sclano, Francesco, Paola Velardi

2007 “Termextractor: a web application to learn the common terminology of interest groups and research communities.” In Proceedings of the 7th Conference on Terminology and Artificial Intelligence (TIA-2007), Sophia Antipolis.

Scott, Mike

1997 “The Right Word in the Right Place: Key Word Associates in Two Languages.” AAA - Arbeiten aus Anglistik und Amerikanistik, 22 (2):239-252.

Simpson-Vlach, Rita and Nick Ellis

2010 “An Academic Formulas List: New Methods in Phraseology Research.” Applied Linguistics 31:487-512. BoP

Thurmair, Gregor

2003 “Making Term Extraction Tools Usable.” In Proceedings of the Joint Conference of the 8th Workshop of the European Association for Machine Translation and the 4th Controlled Language Applications Workshop. Dublin: European Association for Machine Translation.

Vivaldi, Jordi and Horacio Rodriguez

2007 “Evaluation of terms and term extraction systems - A practical approach.” Terminology 13(2):225-248. TSB

Vivaldi, Jordi, Lluis Màrquez, and Horacio Rodríguez

2001 “Improving Term Extraction by System Combination Using Boosting.” In Machine Learning ECML 2001, edited by Luc de Raedt and Peter Flach, 515-526. Series: Lecture Notes in Computer Science, vol. 2167. Springer.

Wermter, Joachim and Udo Hahn

2005 “Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms.” In Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing, 843-850. Association for Computational Linguistics.

Wiechmann, Daniel

2008 “On the Computation of Collostruction Strength: Testing Measures of Association as Expressions of Lexical Bias.” Corpus Linguistics and Linguistic Theory 4 (2):253-290.

Wong, Wilson, Wei Liu, and Mohammed Bennamoun

2007 “Determining termhood for learning domain ontologies using domain prevalence and tendency.” In Proceedings of the Sixth Australasian Conference on Data Mining and Analytics, edited by Peter Christen, Paul Kennedy, Jiuyong Li, Inna Kolyshkina and Graham Williams, 47-54. Australian Computer Society.

Zhang, Ziqi, José Iria, Christopher Brewster, and Fabio Ciravegna

2008 “A Comparative Evaluation of Term Recognition Algorithms.” In Proceedings of the Sixth Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco.

Managing terminology in commercial environments