Vol. 24:2 (2018) ► pp.149–180
Building controlled bilingual terminologies for the municipal domain and evaluating them using a coverage estimation approach
This article examines the status of constructed controlled terminologies from the perspective of the coverage of terms/concepts. To facilitate controlled authoring of Japanese texts of the municipal domain and promote machine translatability into English, we constructed terminologies in the following way: (1) Japanese-English term pairs are extracted from aligned texts; (2) term variations are controlled by defining preferred and proscribed terms for both languages. To assess the coverage of the constructed terminologies, we propose a quantitative extrapolation method that estimates the potential vocabulary size. The coverage estimations show that the coverage of terms for Japanese is higher than that for English by about 10%, which reflects the greater diversity of the translated English terms. The coverage of concepts reaches around 60% for both Japanese and English. The method also enables us to quantitatively estimate how much effort is needed to further increase the coverage.
Article outline
- 1.Introduction
- 2.Related work
- 2.1Term extraction
- 2.2Term variation management
- 2.3Terminology evaluation
- 3.Building controlled bilingual terminology
- 3.1Parallel corpus compilation
- 3.2Term collection
- 3.2.1Terms to be collected
- 3.2.2Term extraction platform and procedure
- 3.2.3Extracted terms
- 3.3Typology of term variation
- 3.4Terminology control
- 4.The coverage estimation approach to evaluate constructed terminologies
- 4.1Self-referring coverage estimation
- 4.2Expected number of terms
- 4.3Growth rate of terms
- 4.4Conditions to be observed
- 5.Results and discussions
- 5.1Population types and present status of terminologies
- 5.2Growth patterns of terminology
- 5.3Examination of models with different data points
- 5.4Use of lexical items for the coverage estimation of terminology
- 6.Conclusions and future work
- Acknowledgements
- Notes
-
References