Chapter 14. Terminology and distributional analysis of corpora

Bertels, Ann

doi:10.1075/tlrp.23.14ber

Part of

Theoretical Perspectives on Terminology: Explaining terms, concepts and specialized knowledge
Edited by Pamela Faber and Marie-Claude L'Homme
[Terminology and Lexicography Research and Practice 23] 2022
► pp. 311–328

Chapter 14
Terminology and distributional analysis of corpora

Ann Bertels | University of Leuven

This chapter discusses the theoretical and methodological principles of distributional semantic analysis. Over the last decade, Distributional Semantics has become very popular in Corpus Linguistics, building on very large corpora and extracting useful semantic information for numerous applications. In our “big data” era, Artificial Intelligence (AI) is carving its way into our daily life. AI’s algorithms in Natural Language Processing (NLP) learn from text collections with easily accessible information in order to find and predict knowledge patterns. This chapter explores the use of distributional analysis for terminological needs, i.e., in specialized domains. It focuses on what distributional analysis stands for, how it works, how it can be used for LSP and Terminology, and why it is useful for terminological needs.

Keywords: Distributional semantics, vector space models, specialized corpora, semantic similarity, semantic relatedness, co-occurrence analysis

Article outline

1.Introduction
2.Distributional Semantics
- 2.1Theoretical and methodological principles
- 2.2Overview of Distributional Semantic Models (DSMs)
  - 2.2.1Differences with respect to the linguistic context
  - 2.2.2Differences with respect to the level of analysis
  - 2.2.3Differences with respect to the distributional representation
3.Distributional analysis of specialized corpora
- 3.1Comparative studies applied to specialized corpora
- 3.2Terminology extraction
- 3.3Ontology building and taxonomy extraction
- 3.4Semantic relations
- 3.5Knowledge patterns and semantic frames
- 3.6Terminological variation
4.Challenges of distributional analysis and issues raised by specialized corpora
- 4.1Corpus size and data sparseness
- 4.2Multi-word terms and compositionality
- 4.3Specialized vocabulary and in-domain knowledge
5.Conclusions
Notes

Published online: 14 June 2022

https://doi.org/10.1075/tlrp.23.14ber

Chapter 14Terminology and distributional analysis of corpora

Chapter 14
Terminology and distributional analysis of corpora