Classification-based scientific term detection in patient information

Hoste, Véronique; Vanopstal, Klaar; Lefever, Els; Delaere, Isabelle

doi:10.1075/term.16.1.01hos

Article published In:

Terminology
Vol. 16:1 (2010) ► pp.1–29

Classification-based scientific term detection in patient information

Véronique Hoste

Klaar Vanopstal

Els Lefever

Isabelle Delaere

Although intended for the “average layman”, both in terms of readability and contents, the current patient information still contains many scientific terms. Different studies have concluded that the use of scientific terminology is one of the factors, which greatly influences the readability of this patient information. The present study deals with the problem of automatic term recognition of overly scientific terminology as a first step towards the replacement of the recognized scientific terms by their popular counterpart. In order to do so, we experimented with two approaches, a dictionary-based approach and a learning-based approach, which is trained on a rich feature vector. The research was conducted on a bilingual corpus of English and Dutch EPARs (European Public Assessment Report). Our results show that we can extract scientific terms with a high accuracy (> 80%, 10% below human performance) for both languages. Furthermore, we show that a lexicon-independent approach, which solely relies on orthographical and morphological information is the most powerful predictor of the scientific character of a given term.

Keywords: automatic term extraction, patient information, machine learning

Published online: 11 May 2010

https://doi.org/10.1075/term.16.1.01hos

Cited by (7)

Cited by seven other publications

Order by:

Ivanović, Tanja, Ranka Stanković, Branislava Šandrih Todorović & Cvetana Krstev

2022. Corpus-based bilingual terminology extraction in the power engineering domain. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28:2 ► pp. 228 ff.

Tran, Quoc Duyet, Haydar Demirhan & Anil Dolgun

2021. Bayesian approaches to the weighted kappa-like inter-rater agreement measures. Statistical Methods in Medical Research 30:10 ► pp. 2329 ff.

Tran, Quoc Duyet, Anil Dolgun & Haydar Demirhan

2021. The impact of grey zones on the accuracy of agreement measures for ordinal tables. BMC Medical Research Methodology 21:1

Azari, Razieh, Marziyeh Khalilizadeh Ganjalikhani & Anahita Amirshoja’i

2018. Legislation for patient information leaflets in Iran: Focus on lay-friendliness. Health Promotion Perspectives 8:4 ► pp. 263 ff.

DE CLERCQ, ORPHÉE, VÉRONIQUE HOSTE, BART DESMET, PHILIP VAN OOSTEN, MARTINE DE COCK & LIEVE MACKEN

2014. Using the crowd for readability prediction. Natural Language Engineering 20:3 ► pp. 293 ff.

Marciniak, Małgorzata & Agnieszka Mykowiecka

2014. Terminology extraction from medical texts in Polish. Journal of Biomedical Semantics 5:1 ► pp. 24 ff.

Renahy, Julie, Izabella Thomas, Grégory Chippeaux, Bérenger Germain, Xavier Petiaux, Barbara Rath, Valérie de Grivel, Sylviane Cardey & Dominique A. Vuitton

2011. La «langue contrôlée» et l’informatisation de son utilisation au service de la qualité des textes médicaux et de la sécurité dans le domaine de la santé. In Systèmes d’information pour l’amélioration de la qualité en santé [Informatique et Santé, 1], ► pp. 97 ff.

This list is based on CrossRef data as of 10 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.