Distributional analysis applied to terminology extraction: First results in the domain of psychiatry in Spanish

Nazar, Rogelio

doi:10.1075/term.22.2.01naz

Article published In:

Terminology
Vol. 22:2 (2016) ► pp.141–170

Distributional analysis applied to terminology extraction

First results in the domain of psychiatry in Spanish

Rogelio Nazar | Universidad Católica de Valparaíso

This paper presents the first results of a new method for terminology extraction based on distributional analysis. The intuition behind the algorithm is that single or multi-word lexical units that refer to specialised concepts will show a characteristic co-occurrence pattern, described as a tendency to appear in the same contexts with other conceptually related terms. E.g. the term fluoxetine will systematically appear in the same sentences with other related terms such as depression, serotonin reuptake inhibitor, obsessive–compulsive disorder and others. Of course, terms will co-occur with general vocabulary units as well, but not with a characteristic pattern as when a conceptual relation holds. Experimental evaluation of this method was conducted in a corpus of psychiatry journals from Spain and Latin America, and concluded that the results are significantly better than other methods.

Keywords: terminology extraction, topic signatures, distributional semantics, co-occurrence, text-mining

Published online: 21 February 2017

https://doi.org/10.1075/term.22.2.01naz

References

Alfonseca, E., and S. Manandhar

2002 “Extending a Lexical Ontology by a Combination of Distributional Semantics Signatures.” In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web (EKAW ‘02), ed. by Asunción Gómez-Pérez and V. Richard Benjamins, 1–7. London, UK: Springer-Verlag.

Altmann, G

1980 “Prolegomena to Menzerath’s Law.” Glottometrika 21: 1–10.

Ananiadou, S

1994 “A Methodology for Automatic Term Recognition.” In Proceedings of the 15th International Conference on Computational Linguistics , 1034–1038. Kyoto, Japan.

Anthony, L

2005 “AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom.” In Proceedings of International Professional Communication Conference, (IPCC 2005) , 729–737. 10-13 July 2005, IEEE, Limerick, Ireland.

Artstein, R., and M. Poesio

2008 “Inter-coder Agreement for Computational Linguistics.” Computational Linguistics 34(4): 555–596.

Atserias, J., B. Casas, E. Comelles, M. González, L. Padró, and M. Padró

2006 “FreeLing 1.3: Syntactic and Semantic Services in an Open-source NLP Library.” In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006) . 24-26 May 2006, Genoa, Italy.

Aubin, S., and T. Hamon

2006 “Improving Term Extraction with Terminological Resources.” In Advances in Natural Language Processing: Lecture Notes in Computer Science, ed. by T. Salakoski, F. Ginter, S. Pyysalo, and T. Pahikkala, 380–387. Berlin/Heidelberg: Springer.

Baeza-Yates, R., and B. Ribeiro-Neto

1999 Modern Information Retrieval. New York: ACM Press.

Baroni, M., and A. Lenci

2010 “Distributional Memory: A General Framework for Corpus-Based Semantics.” Computational Linguistics 36(4): 673–721.

Benavent, P., and S. Parrilla

2006 “Análisis de la extracción automática de términos con el programa informático ExtraTerm.” Fòrum de recerca 121:1–10.

Bernier-Colborne, G

2014 “Identifying Semantic Relations in a Specialized Corpus through Distributional Analysis of a Cooccurrence Tensor.” In Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014) , 57–62. Dublin, Ireland.

Bertels, A., and D. Speelman

2014 “Clustering for Semantic Purposes: Exploration of Semantic Similarity in a Technical Corpus.” Terminology 20(2): 279–303.

Bolshakova, E., N. Loukachevitch, and M. Nokel

2013 “Topic Models Can Improve Domain Term Extraction.” In Advances in Information Retrieval, ed. by Pavel Serdyukov, Pavel Braslavski, Sergei O. Kuznetsov, Jaap Kamps, Stefan Rüger, Eugene Agichtein, Ilya Segalovich, and Emine Yilmaz. Lecture Notes in Computer Science, 684–687. Berlin/Heidelberg: Springer.

Bourigault, D., I. Gonzales-Mullier, and C. Gros

1996 “LEXTER, a Natural Language Tool for Terminology Extraction.” In Proceedings of the 7th EURALEX Congress , ed. by M. Gellerstam, J. Järborg, S. Malmgren, K. Norén, L. Rogström, and C. Röjder Papmehl, 771–779. Göteborg, Sweden.

Bourigault, D., and C. Jacquemin

1999 “Term Extraction + Term Clustering: An Integrated Platform for Computer-Aided Terminology.” In Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics (EACL ‘99) , 15–22. Association for Computational Linguistics, Stroudsburg, PA, USA.

Budin, G

2001 “A Critical Evaluation of the State-of-the-art of Terminology Theory.” ITTF Journal 12(1-2): 7–23.

Bullinaria, J.A

2008 “Semantic Categorization Using Simple Word Co-occurrence Statistics.” In Proceedings of the ESSLLI Workshop on Distributional Lexical Semantics, ed. by M. Baroni, S. Evert, and A. Lenci, 1–8. Hamburg, Germany: ESSLLI.

Bullinaria, J., and J. Levy

2007 “Extracting Semantic Representations from Word Co-occurrence Statistics: A Computational Study.” Behavior Research Methods 39(3): 510–526.

Cabré, M.T

1992 La terminologia. La teoria, els mètodes, les aplicacions. Barcelona: Empúries.

Cabré. M.T

1999 La terminologia: representación y comunicación. Barcelona: IULA.

Cabré, M.T., R. Estopà, and J. Vivaldi

2001 “Automatic Term Detection: A Review of Current Systems.” In Recent Advances in Computational Terminology, ed. by D. Bourigault, C. Jacquemin, and M.-C. L’Homme, 53–87. Amsterdam: John Benjamins.

Conrado, M., T. Pardo, and S. Rezende

2013 “A Machine Learning Approach to Automatic Term Extraction using a Rich Feature Set.” In Proceedings of the 2013 NAACL HLT Student Research Workshop , 16–23. Atlanta, US: Association for Computational Linguistics.

Dagan, I., and K. Church

1994 “Termight: Identifying and Translating Technical Terminology.” In Proceedings of the fourth Conference on Applied Natural Language Processing (ANLC ‘94) , 34–40. Stuttgart, Germany.

Daille, B

1994 Approche mixte pour l’extraction automatique de terminologie: statistiques lexicales et filtres linguistiques. Thèse de Doctorat en Informatique Fondamentale. Université Paris 7, Paris.

Drouin, P

2003 “Term Extraction Using Non-technical Corpora as a Point of Leverage.” Terminology 9(1): 99–117.

Eco, U

1975 Tratado de semiótica general. Barcelona: Lumen.

1979/2000 Lector in fabula. Barcelona: Lumen.

Enguehard, C., and L. Pantera

1994 “Automatic Natural Acquisition of a Terminology.” Journal of Quantitative Linguistics 2(1): 27–32.

Enguehard, C., B. Daille, and E. Morin

2002 “Tools for Terminology Processing.” In Proceedings of the Indo-European Conference on Multilingual Communications Technologies (IEMCT) , 218–229. Pune, India.

Faber, P., P. León, and J. Prieto

2009 “Semantic relations, dynamicity and terminological knowledge bases”. Current Issues in Language Studies 1(1): 1–23.

Felber, H

1984 Terminology Manual. Paris: Unesco, Infoterm.

Firth, J

1957 Papers in Linguistics 1934-1951. London: Oxford University Press.

Gaussier, E

2001 “General Considerations on Bilingual Terminology Extraction.” In Recent Advances in Computational Terminology, ed. by D. Bourigault, C. Jacquemin, and M.-C. L’Homme, 167–183. Amsterdam: John Benjamins.

Heaps, H

1978 Information Retrieval: Computational and Theoretical Aspects. New York: Academic Press.

Herdan, G

1964 Quantitative Linguistics. Washington: Butterworths.

Jacquemin, C

1997 Variation terminologique: Reconnaissance et acquisition automatiques de termes et de leurs variantes en corpus. Mémoire d’Habilitation à Diriger des Recherches en informatique fondamentale, Université de Nantes, Nantes.

Justeson, J., and S. Katz

1995 “Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text.” Natural Language Engineering 1(1): 9–27.

Kageura, K., and B. Umino

1996 “Methods of Automatic Term Recognition.” Terminology 3(2): 259–290.

Kageura, K

2002 The Dynamics of Terminology: A Descriptive Theory of Term Formation and Terminological Growth. Amsterdam: John Benjamins.

2012 The Quantitative Analysis of the Dynamics and Structure of Terminologies. Amsterdam: John Benjamins.

Kilgarriff, A., and D. Tugwell

2001 “Word Sketch: Extraction and Display of Significant Collocations for Lexicography.” In Proceedings of the ACL Workshop on Collocation: Computational Extraction, Analysis and Exploitation, 32–38. Toulouse, France.

Kilgarriff, A., and I. Renau

2013 “esTenTen, a Vast Web Corpus of Peninsular and American Spanish.” Procedia Social and Behavioral Sciences 951: 12–19.

Lavelli, A., F. Sebastiani, and R. Zanoli

2004 “Distributional Term representations: An Experimental Comparison.” In Proceedings of the thirteenth ACM International Conference on Information and knowledge management (CIKM ‘04) , 615–624. ACM, New York.

L’Homme, M.C

2004 La terminologie: principes et techniques. Montréal: Presses Université de Montréal.

L’Homme, M-C

2005 “Sur la notion de terme.” Meta: Journal des traducteurs 50(4): 1112–1132.

2015 “Predicative Lexical Units in Terminology.” In Recent Advances in Language Production, ed. by N. Gala, R. Rapp, and G. Bel-Enguix, Cognition and the Lexicon, 75–93. Berlin: Springer.

Loginova, E., A. Gojun, H. Blancafort, M. Guegan, T. Gornostay, and U. Heid

2012 “Reference Lists for the Evaluation of Term Extraction Tools.” In Proceedings of Terminology and Knowledge Engineering (TKE 2012) . Madrid, Spain.

Lossio-Ventura, J.A., C. Jonquet, M. Roche, and M. Teisseire

2014 “Biomedical Terminology Extraction: A New Combination of Statistical, Web Mining Approaches.” In Proceedings of Journées Internationales d’Analyse Statistique Des Données Textuelles (JADT2014) , ed. by E. Née, J-M. Daube, M. Valette, and S. Fleury, 421–432. June 3-6, 2014, Paris, France.

Lund, K., C. Burgess, and R. Atchley

1995 “Semantic and Associative Priming in High-dimensional Semantic Space.” In Proceedings of the 17th Annual Conference of the Cognitive Science Society 171: 660–665. Hillsdale, NJ: Erlbaum.

Manning, Ch., P. Raghavan, and H. Schütze

2008 Introduction to Information Retrieval. Cambridge: Cambridge University Press.

Maynard, D., and S. Ananiadou

2000 “TRUCKS: A Model for Automatic Term Recognition.” Journal of Natural Language Processing 8(1): 101–125.

Navigli, R., P. Velardi, and S. Faralli

2011 “A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch.” In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI’11) , 31:1871–1877. July 16-22, 2011. Barcelona, Spain: AAAI Press.

Nazar, R

2011 “A Statistical Approach to Term Extraction.” International Journal of English Studies 11(2): 153–176.

Pazienza, M.T., M. Pennacchiotti, and F.M. Zanzotto

2005 “Terminology Extraction: An Analysis of Linguistic and Statistical Approaches.” In Knowledge Mining, ed. by S. Sirmakessis, 255–279. Berlin/Heidelberg: Springer.

Pantel, P., and D. Lin

2001 “A Statistical Corpus-Based Term Extractor.” In Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence , 36–46. London, UK.

Patry, A., and P. Langlais

2005 “Corpus-Based Terminology Extraction.” In 7th International Terminology and Knowledge Engineering Conference (TKE 2005) , 313–321. Copenhagen, Danemark.

Périnet, A., and T. Hamon

2014 “Generalising and Normalising Distributional Contexts to Reduce Data Sparsity: Application to Medical Corpora.” In Proceedings of the 4th International Workshop on Computational Terminology , 1–10. Dublin, Ireland.

Porter, M

1980 “An Algorithm for Suffix Stripping.” Program 14(3): 130–137.

Oliver, T., and M. Vàzquez

2007 “A Free Terminology Extraction Suite.” In Proceedings of the Twenty-ninth International Conference on Translating and the Computer, 29–30. November 2007, London.

Rey, A

1979/1992 “Noms et notions: la terminologie” Que sais-je? Paris: Presses universitaires de France.

1982 “Encyclopédies et dictionnaires” Que sais-je? Paris: Presses universitaires de France.

Sager, J.C

1990 A Practical Course in Terminology Processing. Amsterdam: John Benjamins.

Schmid, H

1994 “Probabilistic Part-of-Speech Tagging Using Decision Trees.” In Proceedings of International Conference on New Methods in Language Processing , 44–49. Manchester, UK.

Scott, M

1997 “PC Analysis of Key Words and Key Key Words.” System 25(2): 233–245.

Spärck Jones, K

1972 “A Statistical Interpretation of Term Specificity and its Application in Retrieval.” Journal of Documentation 28(1): 11–21.

Swales, J

2011 Aspects of Article Introductions. Ann Arbor: University of Michigan Press.

Temmerman, R

2000 Towards New Ways of Terminological Description. The Sociocognitive Approach. Amsterdam: John Benjamins.

Turney, P., and P. Pantel

2010 “From Frequency to Meaning: Vector Space Models of Semantics.” Journal of Artificial Intelligence Research 371: 141–188.

Vargas-Sierra, C

2014 “Estudio contrastivo inglés-español de combinatoria especializada.” Paper presented at XIV Simposio Iberoamericano de Terminología (RITerm 2014) . Santiago, Chile.

Vignaux, G

1976 L’argumentation. Essai d’une logique discursive. Genève: Droz.

Vivaldi, J

2001 Extracción de candidatos a término mediante combinación de estrategias heterogéneas. PhD thesis, Universitat Pompeu Fabra, Barcelona

Vivaldi, J., and H. Rodríguez

2011 “Extracting Terminology from Wikipedia.” Procesamiento del lenguaje natural 471: 65–73.

Wüster, E

1979 Introduction to the General Theory of Terminology and Terminological Lexicography. Wien: Springer.

Zadeh, B., and S. Handschuh

2014 “Evaluation of Technology Term Recognition with Random Indexing.” In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) , 4027–2032. May 26-31, 2014. Reykjavik, Iceland.

Zhang, Z., J. Iria, C. Brewster, and F. Ciravegna

2008 “A Comparative Evaluation of Term Recognition Algorithms.” In Proceedings of The Sixth International Conference on Language Resources and Evaluation, (LREC 2008), 2108–2113. Marrakech, Morocco.

Zipf, G.K

1949 Human Behaviour and the Principle of Least-Effort. Cambridge, MA: Addison-Wesley.

Cited by

Cited by 8 other publications

Order by:

Du, Jiali, Christina Alexantris & Pingfang Yu

2020. Comparative Research on Terminology Databases in Europe and China. In Human Interaction, Emerging Technologies and Future Applications II [Advances in Intelligent Systems and Computing, 1152], ► pp. 252 ff.

Fkih, Fethi & Mohamed Nazih Omri

2020. Hidden data states-based complex terminology extraction from textual web data model. Applied Intelligence 50:6 ► pp. 1813 ff.

Kováříková, Dominika

2021. Machine Learning in Terminology Extraction from Czech and English Texts. Linguistic Frontiers 0:0

Kováříková, Dominika

2021. Machine Learning in Terminology Extraction from Czech and English Texts. Linguistic Frontiers 4:2 ► pp. 23 ff.

Lillo Fuentes, Fernando, Carmen López-Ferrero & René Venegas

2023. ¿Qué caracteriza a una buena sección RESULTADOS? Vinculación entre calidad y rasgos lingüístico-discursivos en Trabajos Finales de Grado de Ingeniería Informática. Círculo de Lingüística Aplicada a la Comunicación 96 ► pp. 175 ff.

Mouratidis, Despoina, Katia Kermanidis & Andreas Kanavos

2023. 2023 14th International Conference on Information, Intelligence, Systems & Applications (IISA), ► pp. 1 ff.

Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever

2020. In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora. Language Resources and Evaluation 54:2 ► pp. 385 ff.

[no author supplied]

2022. Theoretical Perspectives on Terminology [Terminology and Lexicography Research and Practice, 23],

This list is based on CrossRef data as of 9 june 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.