Two methods for extracting “specific” single-word terms from specialized corpora: Experimentation and evaluation

Lemay, Chantal; L'Homme, Marie-Claude; Drouin, Patrick

doi:10.1075/ijcl.10.2.05lem

Article published In:

International Journal of Corpus Linguistics
Vol. 10:2 (2005) ► pp.227–255

Two methods for extracting “specific” single-word terms from specialized corpora

Experimentation and evaluation

Chantal Lemay | Université de Montréal

Marie-Claude L'Homme | Université de Montréal

Patrick Drouin | Université de Montréal

Recently, corpus comparison has been used by a number of researchers for extracting single-word terms (SWTs) from specialized corpora. It is viewed as a means to supplement multi-word term (MWT) extraction, the focus of which is on noun phrases. However, little is known about the value of this technique in a terminological setting. This paper examines two different methods for finding French SWTs in the field of computing. The first one (M1) compares the specialized corpus to a corpus considered to be a reflection of language as a whole. The second one (M2) breaks down the specialized corpus into six topical subcorpora that are compared in turn to the entire specialized corpus. The calculation relies on standard normal distribution and is carried out by a program calledTermoStat. The specific units produced by both methods are then evaluated by comparing them to the contents of two specialized dictionaries. We also compare the results yielded by the two methods. Results show that precision is fair (approximately 50%of units extracted by both methods can be found in specialized dictionaries). However, recall is lower in both methods. Results also reveal that, even though M1 yields better results that M2, both methods are useful for identifying SWTs and should be considered in terminological work.

Keywords: corpus comparison, term-extraction, single-word term, terminology, specialized corpora

Published online: 14 June 2005

https://doi.org/10.1075/ijcl.10.2.05lem

Cited by (12)

Cited by 12 other publications

Order by:

Drouin, Patrick

2017. Chapter 6. Should we be looking for the needle in the haystack or in the straw poll?. In Multiple Perspectives on Terminological Variation [Terminology and Lexicography Research and Practice, 18], ► pp. 131 ff.

Ghazzawi, Nizar, Benoît Robichaud, Patrick Drouin & Fatiha Sadat

2017. Automatic extraction of specialized verbal units. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 23:2 ► pp. 207 ff.

Pérez, María José Marín

2016. Measuring the degree of specialisation of sub-technical legal terms through corpus comparison. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 22:1 ► pp. 80 ff.

Pérez, María José Marín

2023. Automatic term recognition and legal language. In Handbook of Terminology [Handbook of Terminology, 3], ► pp. 511 ff.

Bowker, Lynne

2015. Terminology and translation. In Handbook of Terminology [Handbook of Terminology, 1], ► pp. 304 ff.

Marín, María José

2014. Evaluation of five single-word term recognition methods on a legal English corpus. Corpora 9:1 ► pp. 83 ff.

Pérez, María José Marín & Camino Rea Rizzo

2013. Automatic Access to Legal Terminology Applying Two Different Automatic Term Recognition Methods. Procedia - Social and Behavioral Sciences 95 ► pp. 455 ff.

Bowker, Lynne & Des Fisher

2012. Technology and Terminology. In The Encyclopedia of Applied Linguistics,

Liu, Xiao-Yue & Chunyu Kit

2009. 2009 International Conference on Machine Learning and Cybernetics, ► pp. 3499 ff.

L’Homme, Marie-Claude

2006. Sur la notion de « terme ». Meta 50:4 ► pp. 1112 ff.

L'Homme, Marie-Claude

2004. Bibliographie. In La terminologie : principes et techniques, ► pp. 259 ff.

[no author supplied]

2021. The Corporate Terminologist [Terminology and Lexicography Research and Practice, 21],

This list is based on CrossRef data as of 17 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.