Problematising characteristicness
A biomedical association case study
Keyness is a commonly used method in corpus linguistics and is assumed to identify key items that are characteristic of 1 corpus when compared to another. This paper puts this assumption to the test by comparing case study corpora in the fields of genetic, immunological and psychiatric biomedical association studies, using what we refer to as a ‘K-FLUX’ analysis to produce a set of key items. Experts from within these fields are asked to evaluate the extent to which identified key items are characteristic of their discipline. The paper concludes that less than 50% of the items identified by the method are rated as highly characteristic by experts and that this ranges between types of association study. Further, there is difficulty in reaching a consensus over what is deemed to be ‘characteristic’, thus posing a challenge to the ultimate aim of the keyness method. The paper demonstrates the value of supporting corpus linguistic studies with expert assessments to evaluate whether (and which) items can be said to be indicative of a particular field.
Article outline
- 1.Introduction
- 2.Using keyness to determine characteristicness
- 3.Data
- 4.Words, lemmas and word families
- 5.Generating key items for evaluation
- 6.Evaluation studies
- 6.1Study 1: Pilot study
- 6.1.1Procedure
- 6.1.2Results and discussion
- 6.2Study 2: Wider evaluative study
- 6.2.1Procedure
- 6.2.2Results
- 6.2.3Discussion
- 7.General discussion and conclusion
- Notes
-
References
References (33)
References
Alderson, C. (2007). Judging the frequency of English words. Applied Linguistics, 28(3), 383–409.
Anthony, L. (2018). AntConc (Version 3.5.7) [Computer software]. Waseda University. [URL]
Bauer, L., & Nation, P. (1993). Word families. International Journal of Lexicography, 6(4), 253–279.
Cheng, W. (2007). Concgramming: A corpus-driven approach to learning the phraseology of discipline-specific texts. CORELL: Computer Resources for Language Learning, 11, 22–35.
Conway, M. (2010). Mining a corpus of biographical texts using keywords. Literary and Linguistic Computing, 25(1), 23–35.
El-Haj, M., Rayson, P., Piao, S., & Knight, J. (2018). Profiling medical journal articles using a gene ontology semantic tagger. In N. Calzolari et al. (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 4593–4597). European Language Resources Association (ELRA). [URL]
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.
Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor & A. Marchi (Eds.), Corpus Approaches to Discourse: A Critical Review (pp. 225–258). Routledge.
Gabrielatos, C., & Marchi, A. (2012, September 13–14). Keyness: Appropriate metrics and practical issues [Paper presentation]. Corpus-Assisted Discourse Studies International Conference, Bologna, Italy. [URL]
Hamilton, C., Adolphs, S., & Nerlich, B. (2007). The meanings of ‘risk’: A view from corpus linguistics. Discourse & Society, 18(2), 163–181.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.
Kehoe, A., & Gee, M. (2011). Social Tagging: A new perspective on textual “aboutness”. Studies in Variation, Contacts and Change in English, 6(5). [URL]
Landis, J. R. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
NCBI. (2018). PubMed. National Center for Biotechnology Information, U.S. National Library of Medicine. Bethesda MD, USA. [URL]
Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. In C. Aggarwal & C. Zhai (Eds.), Mining Text Data (pp. 43–76). Springer.
Phillips, M. (1989). Lexical Structure of Text. Discourse Analysis Monographs: 12. English Language Research, University of Birmingham.
Plappert, G. (2017). Candidate knowledge? Exploring epistemic claims in scientific writing: A corpus-driven approach. Corpora, 12(3), 425–457.
Pojanapunya, P., & Todd, R. W. (2018). Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, 14(1), 133–167.
Rayson, P. (2016). Log-likelihood and effect size calculator [Excel spreadsheet]. [URL]
Scott, M. (1997). PC analysis of keywords – and key keywords. System, 25(2), 233–245.
Scott, M. (2015). WordSmith Tools Manual: Consistency analysis. [URL]
Scott, M. (2019). WordSmith Tools (Version 7) [Computer software]. Lexical Analysis Software. [URL]
Taylor, C. (2013). Searching for similarity using corpus-assisted discourse studies. Corpora, 8(1), 81–113.
Taylor, C. (2018). Similarity. In C. Taylor, C. & A. Marchi (Eds.), Corpus Approaches to Discourse: A Critical Review (pp. 19–37). Routledge.
Wilson, A. (2013). Embracing Bayes factors for key item analysis in corpus linguistics. In M. Bieswanger & A. Koll-Stobbe (Eds.), New Approaches to the Study of Linguistic Variability (pp. 3–11). Peter Lang.
Cited by (3)
Cited by three other publications
López-Rodríguez, Clara Inés
2022.
Emotion at the end of life: Semantic annotation and key domains in a pilot study audiovisual corpus.
Lingua 277
► pp. 103401 ff.
Prentice, Sheryl, Paul Rayson, Jo Knight, Mahmoud El-Haj & Solly Elstein
2022.
A Domain Based Approach to Semantic Lexicon Expansion.
International Journal of Lexicography 35:3
► pp. 364 ff.
Prentice, Sheryl & Paul J. Taylor
2021.
Poles Apart? The Extent of Similarity Between Online Extremist and Non-extremist Message Content.
Frontiers in Psychology 12
This list is based on CrossRef data as of 5 august 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.