Vol. 28:1 (2022) ► pp.1–36
Utilising heterogeneous language resources for term extraction in maritime domains
The development of terminologies for domains where these are lacking is a time-consuming and costly task. This article takes a methodological perspective and addresses a general methodological question: how can we, with limited funding, utilise to a maximal degree, existing language resources to create a terminology at a relatively low cost? Although an important player in the maritime industries for many centuries, Norway has not prioritised the systematic development of an official maritime terminology. The article therefore focuses specifically on efforts to develop a national resource for maritime domains. The article describes efforts to create a corpus of popular science and a parallel corpus of technical texts. Six different term extraction methods are applied. These include corpus-based statistical analyses of frequency, collocation and keyness, as well as bilingual term extraction. Finally, the pros and cons of each method are evaluated by means of a cost-benefit analysis.
Article outline
- 1.Introduction
- 2.Historical and theoretical background
- 3.Methods and criteria for term extraction in maritime domains
- 3.1Maritime domains
- 3.2Overview of term extraction methods
- 3.3Criteria for unithood and termhood
- 4.Methodological specifics and results from the various term extraction methods
- 4.1Method 1: Frequency analysis of domain-specific corpus
- 4.2Method 2: Keyness analysis of domain-specific vs. general corpus
- 4.3Method 3: Collocation analysis of domain-specific corpus
- 4.4Method 4: Chunking of aligned sentences from a parallel domain-specific corpus
- 4.5Method 5: Retrieval of terms from domain-specific lexical resources
- 4.6Method 6: Retrieval of domain-specific entries in bilingual general dictionary
- 5.Results
- 6.Concluding remarks
- Acknowledgements
- Notes
-
References