Article published in:
Lexical semantic approaches to terminologyEdited by Pamela Faber and Marie-Claude L'Homme
[Terminology 20:2] 2014
► pp. 279–303
Clustering for semantic purposes
Exploration of semantic similarity in a technical corpus
This paper presents an innovative approach, within the framework of distributional semantics, for the exploration of semantic similarity in a technical corpus. In complement to a previous quantitative semantic analysis conducted in the same domain of machining terminology, this paper sets out to discover fine-grained semantic distinctions in an attempt to explore the semantic heterogeneity of a number of technical items. Multidimensional scaling analysis (MDS) was carried out in order to cluster first-order co-occurrences of a technical node with respect to shared second-order and third-order co-occurrences. By taking into account the association values between relevant first and second-order co-occurrences, semantic similarities and dissimilarities between first-order co-occurrences could be determined, as well as proximities and distances on a graph. In our discussion of the methodology and results of statistical clustering techniques for semantic purposes, we pay special attention to the linguistic and terminological interpretation.
Keywords: Multidimensional scaling (MDS), distributional semantics, specialized corpora, second-order and third-order co-occurrences, semantic similarity
Published online: 31 October 2014
https://doi.org/10.1075/term.20.2.07ber
https://doi.org/10.1075/term.20.2.07ber
References
Arntz, Reiner, and Heribert Picht
Baayen, Rolf H
Bertels, Ann, and Dirk Speelman
2013 “Exploration sémantique visuelle à partir des cooccurrences de deuxième et troisième ordre.” In
Actes de Traitement Automatique des Langues Naturelles (TALN 2013) Atelier Sémantique Distributionnelle (SemDis)
, 126–139. Sables d’Olonne, France.
Bertels, Ann, Dirk Speelman, and Dirk Geeraerts
Bertels, Ann
2006 La polysémie du vocabulaire technique. Une étude quantitative. PhD thesis. University of Leuven.
Biemann, Chris, Stefan Bordag, and Uwe Quasthoff
2004 “Automatic Acquisition of Paradigmatic Relations Using Iterated Co-occurrences.” In
Proceedings of Language Resources and Evaluation (LREC 2004)
, 967–970. Lisboa, Portugal.
Borg, Ingwer, and Patrick Groenen
Cabré, Maria Teresa
Church, Kenneth W., and Patrick Hanks
Clarke, Daoud
Clarke, K.R
Condamines, Anne, and Josette Rebeyrolle
Dunning, Ted
Eriksen, Lars
Evert, Stefan
2007 Corpora and Collocations. Extended Manuscript of Chapter 58 of Lüdeling A., and M. Kytö. 2008. Corpus Linguistics. An International Handbook
. Berlin: Mouton de Gruyter. http://www.stefan-evert.de/PUB/Evert2007HSK_extended_manuscript.pdf. Accessed June 2014.
Faber, Pamela
Ferret, Olivier
2010 “Similarité sémantique et extraction de synonymes à partir de corpus.” In
Actes de Traitement Automatique des Langues Naturelles (TALN 2010)
. Montréal, Canada.
Firth, John R
Gaudin, François
Grefenstette, Gregory
1994 “Corpus-derived First, Second and Third-order Word Affinities.” In
Proceedings of Euralex 1994. International Congress on Lexicography
, 279–290. Amsterdam, the Netherlands.
Habert, Benoît, Gabriel Illouz, and Helka Folch
Heylen, Kris, Dirk Speelman, and Dirk Geeraerts
2012 “Looking at Word Meaning. An Interactive Visualization of Semantic Vector Spaces for Dutch Synsets.” In
Proceedings of the European Chapter of the Association for Computational Linguistics (EACL 2012)
, 16–24. Avignon, France.
Kruskal, Joseph B., and Myron Wish
Landauer, Thomas K., and Susan T. Dumais
Lemaire, Benoît, and Guy Denhière
2006 “Effects of High-Order Co-occurrences on Word Semantic Similarity.” Current Psychology Letters 18 (1). http://cpl.revues.org/index471.html. Accessed June 2014.
Morardo, Mikaël, and Eric Villemonte de La Clergerie
2013 “Vers un environnement de production et de validation de ressources lexicales sémantiques.” In
Actes de Traitement Automatique des Langues Naturelles (TALN 2013) Atelier Sémantique Distributionnelle (SemDis)
, 167–180. Sables d’Olonne, France.
Morlane-Hondère, François
2013 “Utiliser une base distributionnelle pour filtrer un dictionnaire de synonymes.” In
Actes de Traitement Automatique des Langues Naturelles (TALN 2013) Atelier Sémantique Distributionnelle (SemDis)
, 112–125. Sables d’Olonne, France.
Nazar, Rogelio, Jorge Vivaldi, and Leo Wanner
Padó, Sebastian, and Mirella Lapata
Peirsman, Yves, and Dirk Geeraerts
2009 “Predicting Strong Associations on the Basis of Corpus Data.” In
Proceedings of the European Chapter of the Association for Computational Linguistics (EACL 2009)
, 648–656. Athens, Greece.
Sahlgren, Magnus
2006 The Word-Space Model. PhD thesis, Stockholm University, Sweden.
Schütze, Hinrich
Temmerman, Rita
Turney, Peter D., and Patrick Pantel
van der Laan, Mark J., and Katherine S. Pollard
Venables, William N., and Brian D. Ripley
Wielfaert, Thomas, Kris Heylen, and Dirk Speelman
2013 “Interactive Visualizations of Semantic Vector Spaces for Lexicological Analysis.” In
Actes de Traitement Automatique des Langues Naturelles (TALN 2013) Atelier Sémantique Distributionnelle (SemDis)
, 154–166. Sables d’Olonne, France.
Wüster, Eugen
Cited by
Cited by 6 other publications
Bertels, Ann
Du, Jiali, Christina Alexandris, Yajun Pei, Yuming Lian & Pingfang Yu
Du, Jiali, Christina Alexantris & Pingfang Yu
Kwong, Oi Yee
Nazar, Rogelio
This list is based on CrossRef data as of 07 february 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.