The general aim of Term Extraction (TE) is to identify the core vocabulary of a specialized domain. Traditional Manual Term Extraction (MTE) is carried out by a terminologist who lists potential Term Candidates (TC) and then consults with a domain expert to arrive at a final list of validated terms. However, in a rapidly changing world with an ever growing technical vocabulary, the manual maintenance, or in the case of new technological fields, the manual exploration, indexation and description of a domain’s core vocabulary is a labour-intensive enterprise. Automatic Term Extraction (ATE) is meant first and foremost as a computerized aid to alleviate this time-consuming task. For now, ATE concentrates on automating the preliminary identification of Term Candidates. In the long run, ATE might replace MTE completely.
References
Ahmad, Khurshid, Lee Gillam, and Lena Tostevin
1999“Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER).” In The 8th Text Retrieval Conference, edited by Ellen Voorhees and Donna Harman, 717-724. Washington: National Institute of Standards and Technology.
Ananiadou, Sophia
1994 “A methodology for automatic term recognition.” In Proceedings of the 15th conference on Computational linguistics (COLING’94), 1034-1038. Kyoto, Japan.
Assadi, Houssem and Didier Bourigault
1996 “Acquisition et modélisation des connaissances à partir de textes: outils informatiques et éléments méthodologiques.” In Actes du 10ème congrès Reconnaissance des Formes et Intelligence Artificielle, 505-514. Rennes: Association Française pour la Cybernétique Economique et Technique.
Aubin, Sophie and Thierry Hamon
2006“Improving term extraction with terminological resources.” In Proceedings of the 5th international conference on Advances in Natural Language Processing, edited by Tapio Salakoski, Filip Ginter, Sampo Pyysalo and Tapio Pahikkala, 380-387. Berlin/Heidelberg: Springer-Verlag.
Baroni, Marco and Silvia Bernardini
2004“BootCaT: Bootstrapping Corpora and Terms from the Web.” In Proceedings of the Fourth International Conference On Language Resources And Evaluation, edited by Maria Teresa Linoet al., 1313-1316. Lisbon, Portugal: European Language Resources Association.
Basili, Roberto, Alessandro Moschitti, Fabio Massimo Zanzotto, Maria Teresa Pazienza, and Nicolas Nicolov and Ruslan Mitkov
2001 “Modelling Syntactic Context in Automatic Term Extraction.” In Proceedings of Recent Advances in Natural Language Processing, edited by 28-34. Amsterdam/Philadelphia: John Benjamins.
Biber, Douglas
1993“Representativeness in Corpus Design.”Literary and Linguistic Computing 8(4):243-257.
Biber, Douglas and Susan Conrad
1999“Lexical bundles in conversation and academic prose.”Language and Computers 26:181-190.
Bourigault, Didier
1992“Surface grammatical analysis for the extraction of terminological noun phrases.” In Proceedings of 14th International Conference on Computational Linguistics, edited by Christian Boitet, 977-981. Stroudsburg, PA, USA: Association for Computational Linguistics.
Bourigault, Didier and Christian Jacquemin
1999 “Term extraction + term clustering: An integrated platform for computer-aided terminology.” In Proceedings of the ninth conference on European Chapter of the Association for Computational Linguistics (EACL), Bergen, 15-22. Stroudsburg, PA, USA: Association for Computational Linguistics.
Cabré Castellví, M. Teresa, Rosa Estopà, and Jordi Vivaldi
2001“Automatic term detection: a review of current systems.” In Recent Advances in Computational Terminology, edited by Didier Bourigault, Christian Jacquemin and Marie-Claude L’Homme, 53-88. Natural Language Processing, vol. 2. Amsterdam: John Benjamins Publishing Company. TSB
Chung, Teresa Mihwa
2003“A corpus comparison approach for terminology extraction.”Terminology 9(26):221-246.
Church, Kenneth and Patrick Hanks
1990“Word association norms, mutual information, and lexicography.”Computational Linguistics 16(1):22-29.
Da Silva, Joaquim, Gaël Dias, Sylvie Guilloré, and José Pereira Lopes
1999“Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units.” In Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, edited by Pedro Barahona and José Júlio Alferes, 113-132. London, UK: Springer-Verlag.
Daille, Béatrice
1994 “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Workshop at the 32nd Annual Meeting of the Association for Computational Linguistics, 29-36. Stroudsburg, PA, USA: Association for Computational Linguistics.
Daille, Béatrice
1996“Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In The Balancing Act: Combining Symbolic and Statistical Approaches to Language, edited by Philip Resnik and Judith L. Klavans, 49-66. Cambridge, MA, USA: MIT Press.
Daille, Béatrice
2005“Variations and application-oriented terminology engineering.”Terminology 11(1):181-197. TSB
Daille, Béatrice, Eric Gaussier, and Jean-Marc Langé
1994 “Towards automatic extraction of monolingual and bilingual terminology.” In Proceedings of the 15th International Conference on Computational Linguistics, 515-521. Stroudsburg, PA, USA: Association for Computational Linguistics.
Drouin, Patrick
2003“Term extraction using non-technical corpora as a point of leverage.”Terminology 9(1):99-115. TSB
Drouin, Patrick
2006“Termhood: Quantifying the Relevance of a Candidate Term.” Linguistic Insights. Studies in Language and Communication 36:375-391.
Drouin, Patrick and Frédéric Doll
2008 “Quantifying Termhood Through Corpus Comparison”, In Terminology and Knowledge Engineering (TKE-2008), 191-206. Copenhagen, Denmark: Copenhagen Business School.
Dunning, Ted
1993“Accurate methods for the statistics of surprise and coincidence.”Computational Linguistics 19(1):61-74.
Evans, David, Natasa Milic-Frayling, and Robert Lefferts
1995 “Clarit TREC-4 Experiments.” In NIST Special Publication 500-236, edited by Donna Harman, 305-322.
Evert, Stefan
2004“The Statistics of Word Cooccurrences: Word Pairs and Collocations.” PhD diss., University of Stuttgart.
Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima
2000 “Automatic recognition of multi-word terms: The C-value/NC-value method.” International Journal on Digital Libraries 3(2):115-130.
Foo, Jody
2012“Computational Terminology: Exploring Bilingual and Monolingual Term Extraction.” PhD diss., Linköping University.
Foo, Jody and Magnus Merkel
(2010) “Computer aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced editing tools.” In Terminology in Everyday Life, edited by Marcel Thelen and Frieda Steurs, 163-180. New York: John Benjamins.
Groc, Clément de
2011“Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction.” In Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, edited by Olivier Boissier, Boualem Benatallah, Mike P. Papazoglou, Zbigniew W. Ras and Mohand-Said Hacid, 497-498. IEEE Computer Society.
Justeson, John S. and Slava M. Katz
1995 “Technical terminology: some linguistic properties and an algorithm for identification in text”. Natural Language Engineering 1(1):9-27.
Kageura, Kyo
2009 “Computing the potential lexical productivity of head elements in nominal compounds using the textual corpus”. Progress in Informatics, (6):49-56.
Kageura, Kyo and Umino, Bin
1996“Methods of automatic term recognition: a review”. Terminology 3(2):259-289. TSB
Kit, Chunyu
2002 “Corpus tools for retrieving and deriving termhood evidence.” In 5th East Asia Forum of Terminology, 69-80. Haikou, China.
Kit, Chunyu and Xiauyue Lui
2008“Measuring mono-word termhood by rank difference via corpus comparison.”Terminology 14(2):204-229.
Korkontzelos, Ioannis, Ioannis Klapaftis, and Suresh Manandhar
2008“Reviewing and Evaluating Automatic Term Recognition Techniques.” In Proceedings of the 6th International Conference on Natural Language Processing, edited by Bengt Nordström and Aarne Ranta, 248-259. Berlin/Heidelberg, Germany: Springer.
Liu, Xiaoyue and Chunyu Kit
2009 “Statistical termhood measurement for mono-word terms via corpus comparison.” In Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, 3499-3504. IEEE Computer Society.
Manning, Christopher and Hinrich Schütze
1999Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press.
Matsuo, Yutaka and Mitsuru Ishizuka
2004“Keyword extraction from a single document using word co-occurrence statistical information.”International Journal on Artificial Intelligence Tools 13(1):157-169.
Maynard, Diana and Sophia Ananiadou
1999“Identifying Contextual Information for Multi-Word Term Extraction.” In Proceedings of the TKE ‘99 International Congress on Terminology and Knowledge Engineering, edited by Peter Sandrini, 212-221. Vienna, Austria: TermNet.
McEnery, Tony, Richard Xiao, and Yukio Tono
editors2006Corpus-based Language Studies: An Advanced Resource Book. London, UK: Routledge.
Medelyan, Olena and Ian H. Witten
2006“Thesaurus based automatic keyphrase indexing.” In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, edited by Gary Marchionini, Michael L. Nelson and Catherine C. Marshall, 296-297. New York, USA: Association for Computer Machinery.
Nakagawa, Hiroshi
2000“Automatic Term Recognition based on Statistics of Compound Nouns.”Terminology 6(2):195-210. TSB
Nakagawa, Hiroshi and Tatsunori Mori
1998“Nested collocation and compound noun for term recognition.” InProceedings of the First Workshop on Computational Terminology, edited by Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 64-70. Montreal, Canada: Université de Montréal.
Nakagawa, Hiroshi and Tatsunori Mori
2002 “A simple but powerful automatic term extraction method.” In Proceedings of the Second International Workshop on Computational Terminology, 1-7. Stroudsburg, PA, USA: Association for Computational Linguistics.
Nenadic, Goran, Sophia Ananiadou, and John McNaught
2004 “Enhancing automatic term recognition through recognition of variation.” In Proceedings of the 20th international Conference on Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics.
Pantel, Patrick and Lin, Dekang
2001“A Statistical Corpus-Based Term Extractor”. In Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of intelligence: Advances in Artificial intelligence, edited by Eleni Stroulia and Stan Matwin, 36-46. Lecture Notes In Computer Science, vol. 2056. London: Springer-Verlag.
Pazienza, Maria Teresa, Marco Pennacchiotti, and Fabio Massimo Zanzotto
2005“Terminology extraction: an analysis of linguistic and statistical approaches.” In Knowledge Mining, edited by Spiros Sirmakessis. Series: Studies in Fuzziness and Soft Computing, Vol.185. Springer-Verlag.
Pecina, Pavel and Pavel Schlesinger
2006 “Combining association measures for collocation extraction.” In Proceedings of the COLING/ACL on Main Conference Poster Sessions Annual Meeting of the ACL, 651-658. Morristown, NJ: Association for Computational Linguistics.
Rizzo, Camino R
2010“Getting on with corpus compilation: from theory to practice.” English for Specific Purposes World, Issue 1(27), vol. 9. http://www.esp-world.info.
Sager, Juan C
1978Commentary by Prof. Juan Carlos Sager. In Actes Table Ronde sur les Problèmes du Découpage du Terme, edited by G. Rondeau, 39-74. Montréal: Commission de Terminologie de l’AILA.
Salton, Gerard, Andrew Wong, and Chung-Su Yang
1975 “A vector space model for automatic indexing.” Communications of the ACM 18:613-620.
Sclano, Francesco, Paola Velardi
2007 “Termextractor: a web application to learn the common terminology of interest groups and research communities.” In Proceedings of the 7th Conference on Terminology and Artificial Intelligence (TIA-2007), Sophia Antipolis.
Scott, Mike
1997“The Right Word in the Right Place: Key Word Associates in Two Languages.”AAA - Arbeiten aus Anglistik und Amerikanistik, 22 (2):239-252.
Simpson-Vlach, Rita and Nick Ellis
2010“An Academic Formulas List: New Methods in Phraseology Research.”Applied Linguistics 31:487-512. BoP
Thurmair, Gregor
2003 “Making Term Extraction Tools Usable.” In Proceedings of the Joint Conference of the 8th Workshop of the European Association for Machine Translation and the 4th Controlled Language Applications Workshop. Dublin: European Association for Machine Translation.
Vivaldi, Jordi and Horacio Rodriguez
2007“Evaluation of terms and term extraction systems - A practical approach.”Terminology 13(2):225-248. TSB
Vivaldi, Jordi, Lluis Màrquez, and Horacio Rodríguez
2001“Improving Term Extraction by System Combination Using Boosting.” In Machine Learning ECML 2001, edited by Luc de Raedt and Peter Flach, 515-526. Series: Lecture Notes in Computer Science, vol. 2167. Springer.
Wermter, Joachim and Udo Hahn
2005 “Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms.” In Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing, 843-850. Association for Computational Linguistics.
Wiechmann, Daniel
2008“On the Computation of Collostruction Strength: Testing Measures of Association as Expressions of Lexical Bias.”Corpus Linguistics and Linguistic Theory 4 (2):253-290.
Wong, Wilson, Wei Liu, and Mohammed Bennamoun
2007 “Determining termhood for learning domain ontologies using domain prevalence and tendency.” In Proceedings of the Sixth Australasian Conference on Data Mining and Analytics, edited by Peter Christen, Paul Kennedy, Jiuyong Li, Inna Kolyshkina and Graham Williams, 47-54. Australian Computer Society.
Zhang, Ziqi, José Iria, Christopher Brewster, and Fabio Ciravegna
2008 “A Comparative Evaluation of Term Recognition Algorithms.” In Proceedings of the Sixth Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco.