This paper presents the first results of a new method for terminology extraction based on distributional analysis. The intuition behind the algorithm is that single or multi-word lexical units that refer to specialised concepts will show a characteristic co-occurrence pattern, described as a tendency to appear in the same contexts with other conceptually related terms. E.g. the term fluoxetine will systematically appear in the same sentences with other related terms such as depression, serotonin reuptake inhibitor, obsessive–compulsive disorder and others. Of course, terms will co-occur with general vocabulary units as well, but not with a characteristic pattern as when a conceptual relation holds. Experimental evaluation of this method was conducted in a corpus of psychiatry journals from Spain and Latin America, and concluded that the results are significantly better than other methods.
2002 “Extending a Lexical Ontology by a Combination of Distributional Semantics Signatures.” In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web (EKAW ‘02), ed. by Asunción Gómez-Pérez and V. Richard Benjamins, 1–7. London, UK: Springer-Verlag.
Altmann, G.
1980 “Prolegomena to Menzerath’s Law.” Glottometrika 21: 1–10.
Ananiadou, S.
1994 “A Methodology for Automatic Term Recognition.” In Proceedings of the
15th International Conference on Computational Linguistics
, 1034–1038. Kyoto, Japan.
Anthony, L.
2005 “AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom.” In Proceedings of
International Professional Communication Conference, (IPCC 2005)
, 729–737. 10-13 July 2005, IEEE, Limerick, Ireland.
Artstein, R., and M. Poesio
2008 “Inter-coder Agreement for Computational Linguistics.” Computational Linguistics 34(4): 555–596.
Atserias, J., B. Casas, E. Comelles, M. González, L. Padró, and M. Padró
2006 “FreeLing 1.3: Syntactic and Semantic Services in an Open-source NLP Library.” In Proceedings of the
Fifth International Conference on Language Resources and Evaluation (LREC 2006)
. 24-26 May 2006, Genoa, Italy.
Aubin, S., and T. Hamon
2006 “Improving Term Extraction with Terminological Resources.” In Advances in Natural Language Processing: Lecture Notes in Computer Science, ed. by T. Salakoski, F. Ginter, S. Pyysalo, and T. Pahikkala, 380–387. Berlin/Heidelberg: Springer.
Baeza-Yates, R., and B. Ribeiro-Neto
1999Modern Information Retrieval. New York: ACM Press.
Baroni, M., and A. Lenci
2010 “Distributional Memory: A General Framework for Corpus-Based Semantics.” Computational Linguistics 36(4): 673–721.
Benavent, P., and S. Parrilla
2006 “Análisis de la extracción automática de términos con el programa informático ExtraTerm.” Fòrum de recerca 121:1–10.
Bernier-Colborne, G.
2014 “Identifying Semantic Relations in a Specialized Corpus through Distributional Analysis of a Cooccurrence Tensor.” In Proceedings of the
Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)
, 57–62. Dublin, Ireland.
2013 “Topic Models Can Improve Domain Term Extraction.” In Advances in Information Retrieval, ed. by Pavel Serdyukov, Pavel Braslavski, Sergei O. Kuznetsov, Jaap Kamps, Stefan Rüger, Eugene Agichtein, Ilya Segalovich, and Emine Yilmaz. Lecture Notes in Computer Science, 684–687. Berlin/Heidelberg: Springer.
Bourigault, D., I. Gonzales-Mullier, and C. Gros
1996 “LEXTER, a Natural Language Tool for Terminology Extraction.” In Proceedings of the 7th
EURALEX Congress
, ed. by M. Gellerstam, J. Järborg, S. Malmgren, K. Norén, L. Rogström, and C. Röjder Papmehl, 771–779. Göteborg, Sweden.
Bourigault, D., and C. Jacquemin
1999 “Term Extraction + Term Clustering: An Integrated Platform for Computer-Aided Terminology.” In Proceedings of the
Ninth Conference on European Chapter of the Association for Computational Linguistics (EACL ‘99)
, 15–22. Association for Computational Linguistics, Stroudsburg, PA, USA.
Budin, G.
2001 “A Critical Evaluation of the State-of-the-art of Terminology Theory.” ITTF Journal 12(1-2): 7–23.
Bullinaria, J.A.
2008 “Semantic Categorization Using Simple Word Co-occurrence Statistics.” In Proceedings of the ESSLLI Workshop on Distributional Lexical Semantics, ed. by M. Baroni, S. Evert, and A. Lenci, 1–8. Hamburg, Germany: ESSLLI.
Bullinaria, J., and J. Levy
2007 “Extracting Semantic Representations from Word Co-occurrence Statistics: A Computational Study.” Behavior Research Methods 39(3): 510–526.
Cabré, M.T.
1992La terminologia. La teoria, els mètodes, les aplicacions. Barcelona: Empúries.
2013 “A Machine Learning Approach to Automatic Term Extraction using a Rich Feature Set.” In Proceedings of the
2013 NAACL HLT Student Research Workshop
, 16–23. Atlanta, US: Association for Computational Linguistics.
Dagan, I., and K. Church
1994 “Termight: Identifying and Translating Technical Terminology.” In Proceedings of the
fourth Conference on Applied Natural Language Processing (ANLC ‘94)
, 34–40. Stuttgart, Germany.
Daille, B.
1994Approche mixte pour l’extraction automatique de terminologie: statistiques lexicales et filtres linguistiques. Thèse de Doctorat en Informatique Fondamentale. Université Paris 7, Paris.
1975Tratado de semiótica general. Barcelona: Lumen.
Eco, U.
1979/2000Lector in fabula. Barcelona: Lumen.
Enguehard, C., and L. Pantera
1994 “Automatic Natural Acquisition of a Terminology.” Journal of Quantitative Linguistics 2(1): 27–32.
Enguehard, C., B. Daille, and E. Morin
2002 “Tools for Terminology Processing.” In Proceedings of the
Indo-European Conference on Multilingual Communications Technologies (IEMCT)
, 218–229. Pune, India.
Faber, P., P. León, and J. Prieto
2009 “Semantic relations, dynamicity and terminological knowledge bases”. Current Issues in Language Studies 1(1): 1–23.
Felber, H.
1984Terminology Manual. Paris: Unesco, Infoterm.
Firth, J.
1957Papers in Linguistics 1934-1951. London: Oxford University Press.
1997Variation terminologique: Reconnaissance et acquisition automatiques de termes et de leurs variantes en corpus. Mémoire d’Habilitation à Diriger des Recherches en informatique fondamentale, Université de Nantes, Nantes.
Justeson, J., and S. Katz
1995 “Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text.” Natural Language Engineering 1(1): 9–27.
2001 “Word Sketch: Extraction and Display of Significant Collocations for Lexicography.” In Proceedings of the ACL Workshop on Collocation: Computational Extraction, Analysis and Exploitation, 32–38. Toulouse, France.
Kilgarriff, A., and I. Renau
2013 “esTenTen, a Vast Web Corpus of Peninsular and American Spanish.” Procedia Social and Behavioral Sciences 951: 12–19.
Lavelli, A., F. Sebastiani, and R. Zanoli
2004 “Distributional Term representations: An Experimental Comparison.” In Proceedings of the
thirteenth ACM International Conference on Information and knowledge management (CIKM ‘04)
, 615–624. ACM, New York.
L’Homme, M.C.
2004La terminologie: principes et techniques. Montréal: Presses Université de Montréal.
L’Homme, M-C.
2005 “Sur la notion de terme.” Meta: Journal des traducteurs 50(4): 1112–1132.
L’Homme, M-C.
2015 “Predicative Lexical Units in Terminology.” In Recent Advances in Language Production, ed. by N. Gala, R. Rapp, and G. Bel-Enguix, Cognition and the Lexicon, 75–93. Berlin: Springer.
Loginova, E., A. Gojun, H. Blancafort, M. Guegan, T. Gornostay, and U. Heid
2012 “Reference Lists for the Evaluation of Term Extraction Tools.” In Proceedings of
Terminology and Knowledge Engineering (TKE 2012)
. Madrid, Spain.
Lossio-Ventura, J.A., C. Jonquet, M. Roche, and M. Teisseire
2014 “Biomedical Terminology Extraction: A New Combination of Statistical, Web Mining Approaches.” In
Proceedings of Journées Internationales d’Analyse Statistique Des Données Textuelles (JADT2014)
, ed. by E. Née, J-M. Daube, M. Valette, and S. Fleury, 421–432. June 3-6, 2014, Paris, France.
Lund, K., C. Burgess, and R. Atchley
1995 “Semantic and Associative Priming in High-dimensional Semantic Space.” In Proceedings of the
17th Annual Conference of the Cognitive Science Society
171: 660–665. Hillsdale, NJ: Erlbaum.
Manning, Ch., P. Raghavan, and H. Schütze
2008Introduction to Information Retrieval. Cambridge: Cambridge University Press.
Maynard, D., and S. Ananiadou
2000 “TRUCKS: A Model for Automatic Term Recognition.” Journal of Natural Language Processing 8(1): 101–125.
Navigli, R., P. Velardi, and S. Faralli
2011 “A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch.” In Proceedings of the
Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI’11)
, 31:1871–1877. July 16-22, 2011. Barcelona, Spain: AAAI Press.
Nazar, R.
2011 “A Statistical Approach to Term Extraction.” International Journal of English Studies 11(2): 153–176.
Pazienza, M.T., M. Pennacchiotti, and F.M. Zanzotto
2005 “Terminology Extraction: An Analysis of Linguistic and Statistical Approaches.” In Knowledge Mining, ed. by S. Sirmakessis, 255–279. Berlin/Heidelberg: Springer.
Pantel, P., and D. Lin
2001 “A Statistical Corpus-Based Term Extractor.” In Proceedings of the
14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence
, 36–46. London, UK.
Patry, A., and P. Langlais
2005 “Corpus-Based Terminology Extraction.” In
7th International Terminology and Knowledge Engineering Conference (TKE 2005)
, 313–321. Copenhagen, Danemark.
Périnet, A., and T. Hamon
2014 “Generalising and Normalising Distributional Contexts to Reduce Data Sparsity: Application to Medical Corpora.” In Proceedings of the
4th International Workshop on Computational Terminology
, 1–10. Dublin, Ireland.
Porter, M.
1980 “An Algorithm for Suffix Stripping.” Program 14(3): 130–137.
Oliver, T., and M. Vàzquez
2007 “A Free Terminology Extraction Suite.” In Proceedings of the Twenty-ninth International Conference on Translating and the Computer, 29–30. November 2007, London.
Rey, A.
1979/1992 “Noms et notions: la terminologie” Que sais-je? Paris: Presses universitaires de France.
Rey, A.
1982 “Encyclopédies et dictionnaires” Que sais-je? Paris: Presses universitaires de France.
1994 “Probabilistic Part-of-Speech Tagging Using Decision Trees.” In Proceedings of
International Conference on New Methods in Language Processing
, 44–49. Manchester, UK.
Scott, M.
1997 “PC Analysis of Key Words and Key Key Words.” System 25(2): 233–245.
Spärck Jones, K.
1972 “A Statistical Interpretation of Term Specificity and its Application in Retrieval.” Journal of Documentation 28(1): 11–21.
Swales, J.
2011Aspects of Article Introductions. Ann Arbor: University of Michigan Press.
2010 “From Frequency to Meaning: Vector Space Models of Semantics.” Journal of Artificial Intelligence Research 371: 141–188.
Vargas-Sierra, C.
2014 “Estudio contrastivo inglés-español de combinatoria especializada.” Paper presented at
XIV Simposio Iberoamericano de Terminología (RITerm 2014)
. Santiago, Chile.
Vignaux, G.
1976L’argumentation. Essai d’une logique discursive. Genève: Droz.
Vivaldi, J.
2001Extracción de candidatos a término mediante combinación de estrategias heterogéneas. PhD thesis, Universitat Pompeu Fabra, Barcelona
Vivaldi, J., and H. Rodríguez
2011 “Extracting Terminology from Wikipedia.” Procesamiento del lenguaje natural 471: 65–73.
Wüster, E.
1979Introduction to the General Theory of Terminology and Terminological Lexicography. Wien: Springer.
Zadeh, B., and S. Handschuh
2014 “Evaluation of Technology Term Recognition with Random Indexing.” In Proceedings of the
Ninth International Conference on Language Resources and Evaluation (LREC’14)
, 4027–2032. May 26-31, 2014. Reykjavik, Iceland.
Zhang, Z., J. Iria, C. Brewster, and F. Ciravegna
2008 “A Comparative Evaluation of Term Recognition Algorithms.” In Proceedings of The Sixth International Conference on Language Resources and Evaluation, (LREC 2008), 2108–2113. Marrakech, Morocco.
Zipf, G.K.
1949Human Behaviour and the Principle of Least-Effort. Cambridge, MA: Addison-Wesley.
Cited by
Cited by 7 other publications
Du, Jiali, Christina Alexantris & Pingfang Yu
2020. Comparative Research on Terminology Databases in Europe and China. In Human Interaction, Emerging Technologies and Future Applications II [Advances in Intelligent Systems and Computing, 1152], ► pp. 252 ff.
Fkih, Fethi & Mohamed Nazih Omri
2020. Hidden data states-based complex terminology extraction from textual web data model. Applied Intelligence 50:6 ► pp. 1813 ff.
Kováříková, Dominika
2021. Machine Learning in Terminology Extraction from Czech and English Texts. Linguistic Frontiers 0:0
Kováříková, Dominika
2021. Machine Learning in Terminology Extraction from Czech and English Texts. Linguistic Frontiers 4:2 ► pp. 23 ff.
Lillo Fuentes, Fernando, Carmen López-Ferrero & René Venegas
2023. ¿Qué caracteriza a una buena sección RESULTADOS? Vinculación entre calidad y rasgos lingüístico-discursivos en Trabajos Finales de Grado de Ingeniería Informática. Círculo de Lingüística Aplicada a la Comunicación 96 ► pp. 175 ff.
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
2020. In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora. Language Resources and Evaluation 54:2 ► pp. 385 ff.
This list is based on CrossRef data as of 1 december 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.