Chapter 14
Terminology and distributional analysis of corpora
This chapter discusses the theoretical and methodological principles of distributional semantic analysis. Over the last decade, Distributional Semantics has become very popular in Corpus Linguistics, building on very large corpora and extracting useful semantic information for numerous applications. In our “big data” era, Artificial Intelligence (AI) is carving its way into our daily life. AI’s algorithms in Natural Language Processing (NLP) learn from text collections with easily accessible information in order to find and predict knowledge patterns. This chapter explores the use of distributional analysis for terminological needs, i.e., in specialized domains. It focuses on what distributional analysis stands for, how it works, how it can be used for LSP and Terminology, and why it is useful for terminological needs.
Article outline
- 1.Introduction
- 2.Distributional Semantics
- 2.1Theoretical and methodological principles
- 2.2Overview of Distributional Semantic Models (DSMs)
- 2.2.1Differences with respect to the linguistic context
- 2.2.2Differences with respect to the level of analysis
- 2.2.3Differences with respect to the distributional representation
- 3.Distributional analysis of specialized corpora
- 3.1Comparative studies applied to specialized corpora
- 3.2Terminology extraction
- 3.3Ontology building and taxonomy extraction
- 3.4Semantic relations
- 3.5Knowledge patterns and semantic frames
- 3.6Terminological variation
- 4.Challenges of distributional analysis and issues raised by specialized corpora
- 4.1Corpus size and data sparseness
- 4.2Multi-word terms and compositionality
- 4.3Specialized vocabulary and in-domain knowledge
- 5.Conclusions
-
Notes