Clinical sublanguages
Vocabulary structure and its impact on term weighting
Due to its specific linguistic properties, the language found in clinical records has been characterized as a distinct sublanguage. Even within the clinical domain, though, there are major differences in language use, which has led to more fine-grained distinctions based on medical fields and document types. However, previous work has mostly neglected the influence of term variation. By contrast, we propose to integrate the potential for term variation in the characterization of clinical sublanguages. By analyzing a corpus of clinical records, we show that the different sections of these records vary systematically with regard to their lexical, terminological and semantic composition, as well as their potential for term variation. These properties have implications for automatic term recognition, as they influence the performance of frequency-based term weighting.
Article outline
- 1.Background
- 2.Related research
- 3.Sublanguages, semantic classes and variation types
- 3.1Sublanguages
- 3.2Classes of medical concepts
- 3.3Types of variation
- 4.Corpus study 1: Characterization of sublanguages across sections
- 4.1Corpus characteristics
- 4.2Preprocessing
- 4.3Annotation procedure and feature set
- 4.4Research questions of corpus study 1
- 4.5Results of corpus study 1
- 4.5.1Global lexical structure
- 4.5.2Distribution of semantic types across sections
- 4.5.3Distribution of term types across sections
- 4.6Discussion of corpus study 1
- 5.Corpus study 2: Impact of vocabulary structure on frequency-based term weighting
- 5.1Research questions of corpus study 2
- 5.2Corpus and preprocessing
- 5.3Term filtering
- 5.4Results of corpus study 2
- 5.4.1Precision
- 5.4.2Recall
- 5.5Discussion of corpus study 2
- 6.Conclusion
- Notes
-
References
References (34)
References
Afzal, Zubair, Ewoud Pons, Ning Kang, Miriam Sturkenboom, Martijn J. Schuemie, and Jan A. Kors. 2014. “ContextD: An Algorithm to Identify Contextual Properties of Medical Terms in a Dutch Clinical Corpus.” BMC Bioinformatics 15(1): 373.
Ahmad, Khurshid, Lee Gillam, and Lena Tostevin. 1999 “University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER).” In Proceedings of the 8th Text Retrieval Conference (TREC-8), ed. by Ellen M. Voorhees, and Donna K. Harman, 717–724. Washington: National Institute of Standards and Technology.
Bansler, Jørgen P., Erling C. Havn, Kjeld Schmidt, and Troels Mønsted. 2016. “Cooperative Epistemic Work in Medical Practice: An Analysis of Physicians’ Clinical Notes.” Computer Supported Cooperative Work 251: 503–546.
Chiaramello, Emma, Francesco Pinciroli, Alberico Bonalumi, Angelo Caroli, and Gabriella Tognola. 2016. “Use of ‘Off-the-Shelf’ Information Extraction Algorithms in Clinical Informatics: A Feasibility Study of MetaMap. Annotation of Italian Medical Notes.” Journal of Biomedical Informatics 631: 22–32.
Doing-Harris, Kristina, Olga Patterson, Sean Igo, and John Hurdle. 2013. “Document Sublanguage Clustering to Detect Medical Specialty in Cross-Institutional Clinical Texts.” In Proceedings of the 7th International Workshop on Data and Text Mining in Biomedical Informatics, 9–12. Accessed June 15, 2017.
Doing-Harris, Kristina, Yarden Livnat, and Stephane Meystre. 2015. “Automated Concept and Relationship Extraction for the Semi-Automated Ontology Management (SEAM) System.” Journal of Biomedical Semantics 6 (15): 1–15.
Faber, Pamela. “Specialized Language Pragmatics.” In A Cognitive Linguistics View of Terminology and Specialized Language, ed. Pamela Faber, 213–239. New York: De Gruyter Mouton, 2010.
Faber, Pamela, and Pilar León-Araúz. 2016. “Specialized Knowledge Representation and the Parameterization of Context.” Frontiers in Psychology 71: 1–20.
Feldman, Keith, and Nicholas Hazekamp. 2016. “Mining the Clinical Narrative: All Text Are Not Equal.” In IEEE International Conference on Healthcare Informatics, 2016, ed. Wai-Tat Fu, Larry Hodges, Kai Zheng, Gregor Stiglic, and Ann Blandford, 271–280. Piscataway, N.J.: IEEE.
Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima. “Natural Language Processing for Digital Libraries Automatic Recognition of Multi-Word Terms: The C-Value/NC-Value Method.” International Journal on Digital Libraries 31 (2000): 115–30.
Friedman, Carol. “Sublanguage Text Processing – Application to Medical Narrative.” In Analyzing language in restricted domains, ed. Ralph, Grishman R., Kittredge, R., 85–102. Hillsdale, NJ: Lawrence Erlbaum, 1986.
Friedman, Carol, Pauline Kra, and Andrey Rzhetsky. 2002. “Two Biomedical Sublanguages: A Description Based on the Theories of Zellig Harris.” Journal of Biomedical Informatics 351: 222–35.
Grigonyte, Gintare, Maria Kvist, Mats Wirén, Sumithra Velupillai, and Aron Henriksson. 2016. “Swedification Patterns of Latin and Greek Affixes in Clinical Text.” Nordic Journal of Linguistics 39(1): 5–37.
Harris, Zellig Sabbettai. A Theory of Language and Information: A Mathematical Approach. Oxford: Clarendon Press, 1991.
He, Zhe, Zhiwei Chen, Sanghee Oh, Jinghui Hou, and Jiang Bian. 2017. “Enriching Consumer Health Vocabulary through Mining a Social Q&A Site: A Similarity-Based Approach.” Journal of Biomedical Informatics 691. Elsevier Inc.: 75–85.
Jensen, Lotte G., and Claus Bossen. 2016. “Factors Affecting Physicians’ Use of a Dedicated Overview Interface in an Electronic Health Record: The Importance of Standard Information and Standard Documentation.” International Journal of Medical Informatics 871: 44–53.
Kaufman, David R., Barbara Sheehan, Peter Stetson, Ashish R. Bhatt, and I. Adele. 2016. “Natural Language Processing-Enabled and Conventional Data Capture Methods for Input to Electronic Health Records: A Comparative Usability Study.” JMIR Medical Informatics 41: e35.
Leaman, Robert, Ritu Khare, and Zhiyong Lu. 2015. “Challenges in Clinical Natural Language Processing for Automated Disorder Normalization.” Journal of Biomedical Informatics 571: 28–37.
León-Araúz, Pilar, Pamela Faber, and Silvia Montero Martínez. “Specialized Language Semantics.” In A Cognitive Linguistics View of Terminology and Specialized Language, ed. Pamela Faber, 133–212. New York: De Gruyter Mouton, 2010.
Lossio-Ventura, Juan Antonio, Clement Jonquet, Mathieu Roche, and Maguelonne Teisseire. “Biomedical Term Extraction: Overview and a New Methodology.” Information Retrieval Journal 19 (2016): 59–99.
Lövestam, Elin, Sumithra Velupillai, and Maria Kvist. 2014. “Abbreviations in Swedish Clinical Text – Use by Three Professions.” Studies in Health Technology and Informatics 2051: 720–24.
Patterson, Olga O., and John F. Hurdle. 2011. “Document Clustering of Clinical Narratives: A Systematic Study of Clinical Sublanguages.” In AMIA 2011 Annual Symposium, 1099–1107.
Periñán-Pascual, Carlos. 2017. DEXTER: A Workbench for Automatic Term Extraction with Specialized Corpora. Natural Language Engineering. Cambridge University Press.
Riveros, Alejandro, Maria De-Arteaga, Fabio A. Gonzalez, and Sergio Jimenez. 2014. “MindLab-UNAL: Comparing Metamap and T-Mapper for Medical Concept Extraction in SemEval 2014 Task 7.” In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), edited by Preslav Nakov and Torsten Zesch, 424–27. Dublin, Ireland: Association for Computational Linguistics.
Roberts, Angus. 2017. “Language, Structure, and Reuse in the Electronic Health Record.” AMA Journal of Ethics 19(3): 281–88.
Rosenbloom, S Trent, Joshua C. Denny, Hua Xu, Nancy Lorenzi, William W. Stead, and Kevin B. Johnson. 2011. “Data from Clinical Notes: A Perspective on the Tension between Structure and Flexible Documentation.” Journal of the American Medical Informatics Association 181: 181–86.
Sager, Naomi, Margaret Lyman, Christine Bucknall, Ngo Nhan, and Leo Tick. 1994. “Natural Language Processing and the Representation of Clinical Data.” Journal of the American Medical Informatics Association 11: 142–60.
Siklósi, Borbála, Attila Novák, and Gábor Prószéky. 2016. “Context-Aware Correction of Spelling Errors in Hungarian Medical Documents.” Computer Speech & Language 351 (2016): 219–33.
Stetson, Peter D., Stephen B. Johnson, Matthew Scotch, and George Hripcsak. 2002. “The Sublanguage of Cross-Coverage.” In Proceedings of the AMIA 2002 Annual Symposium, ed. Isaac S. Kohana, 742–46.
Temnikova, Irina, Ivelina Nikolova, William Baumgartner, Galia Angelova, and Kevin Cohen. 2013. “Closure Properties of Bulgarian Clinical Text.” In Recent Advances in Natural Language Processing 2013 Proceedings, ed. Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, 667–75.
Topaz, Maxim, Kenneth Lai, Dawn Dowding, Victor Lei, Anna Zisberg, Kathryn H. Bowles, and Li Zhou. 2016. “Automated Identification of Wound Information in Clinical Notes of Patients with Heart Diseases: Developing and Validating a Natural Language Processing Application.” International Journal of Nursing Studies 641: 25–31.
Zeng, Qing T., Doug Redd, Guy Divita, Cynthia Brandt, and Jonathan R. Nebeker. 2011. “Characterizing Clinical Text and Sublanguage: A Case Study of the VA Clinical Notes.” J Health Med Informat S3: 1–9.
Cited by (3)
Cited by three other publications
Chai, Christine P.
2023.
Comparison of text preprocessing methods.
Natural Language Engineering 29:3
► pp. 509 ff.
Vezzani, Federica & Giorgio Maria Di Nunzio
2019.
Computational Terminology in eHealth. In
Digital Libraries: Supporting Open Science [
Communications in Computer and Information Science, 988],
► pp. 72 ff.
This list is based on CrossRef data as of 27 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.