Throughout the social sciences, there has been growing pressure to present effect sizes when publishing empirical data (see American Psychological Association, 2001; Parsons & Nelson, 2004). While it seems indisputable that for the majority of quantitative research foci, effect size is an essential element of statistical analysis, this paper argues that specifically for key word analysis in corpus linguistics, the means of reporting effect size must depend on the level of the unit of study of each investigation (single text, collection or large corpus). After exploring some main criticisms of the log-likelihood measure, this paper unpacks the parameters of different measures for keyness and how they might address underlying concerns. It maintains that for the exploration of foregrounded/deviant/salient/marked features in text, the use of log-likelihood scores to rank the results is still fit for purpose and coupled with Bayes Factors is a solid approach for key word analyses.
(2001) Publication Manual of the American Psychological Association (5th ed.). American Psychological Association.
Baker, P.
(2004) Querying keywords: Questions of difference, frequency, and sense in keywords analysis. Journal of English Linguistics, 32(4), 346–359.
Baker, P., Gabrielatos, C., Khosravinik, M., Krzyżanowski, M., McEnery, T., & Wodak, R.
(2008) A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse & Society, 19(3), 273–306.
Bradley, J. V.
(1960) Distribution-free Statistical Tests. Air Research and Development Command.
(2014a) Log Ratio – an informal introduction. ESRC Centre for Corpus Approaches to Social Science (CASS). [URL]
Hardie, A.
(2014b) Statistical identification of keywords, lockwords and collocations as a two-step procedure [Paper presentation]. ICAME 35 Conference, University of Nottingham, Nottingham, UK.
Hoey, M.
(2005) Lexical Priming: A New Theory of Words and Language. Routledge.
(2006) Measures of effect size for chi-squared and likelihood-ratio goodness-of-fit tests. Perceptual and Motor Skills, 103(2), 412–414.
Kass, R. E., & Raftery, A. E.
(1995) Bayes Factors. Journal of the American Statistical Association, 90(430), 773.
Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D.
(2004) The Sketch Engine [Paper presentation]. The 2003 International Conference on Natural Language Processing and Knowledge Engineering, Beijing, China.
Lee, D. Y. W.
(2001) Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning and Technology, 5(3), 37–72.
Leech, G. N., Hundt, M., Mair, C., & Smith, N.
(2009) Change in Contemporary English: A Grammatical Study. Cambridge Univerisity Press.
Leech, G. N., & Short, M. H.
(2007) Style in Fiction: A Linguistic Introduction to English Fictional Prose (2nd ed.). Pearson Longman. (Original work published 1981)
Lexical Computing Ltd
(2014) Statistics used in the Sketch Engine. [URL]
Mahlberg, M.
(2013) Corpus Stylistics and Dickens’s Fiction. Routledge.
Mahlberg, M., Stockwell, P., de Joode, J., Smith, C., & O’Donnell, M. B.
(2016) CLiC Dickens: Novel uses of concordances for the integration of corpus stylistics and cognitive poetics. Corpora, 11(3), 433–463.
Oakes, M. P.
(1998) Statistics for Corpus Linguistics. Edinburgh University Press.
Parsons, T. D., & Nelson, N. W.
(2004) Paradigm shift in social science research: A significance testing and effect size estimation rapprochement?PsycCRITIQUES, 491(Suppl 3).
Partington, A.
(2010) Modern Diachronic Corpus-Assisted Discourse Studies (MD-CADS) on UK newspapers: An overview of the project. Corpora, 5(2), 83–108.
Plonsky, L., & Oswald, F. L.
(2014) How big is “big”? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912.
Raftery, A. E.
(1986) A note on Bayes Factors for Log-Linear contingency table models with vague prior information. Journal of the Royal Statistical Society. Series B (Methodological), 48(2), 249–250.
Rayson, P.
n.d.). UCREL Log-likelihood and effect size calculator. Retrieved November, 2019, from [URL]
(2004) Extending the Cochran rule for the comparison of word frequencies between corpora [Paper presentation]. The 7th International Conference on Statistical Analysis of Textual Data, Louvain-la-Neuve, Belgium. [URL]
Rayson, P., & Garside, R.
(2000) Comparing corpora using frequency profiling [Paper presentation]. The Workshop on Comparing Corpora, Hong Kong University of Science and Technology, Hong Kong. [URL]
(2013) Embracing Bayes Factors for key item analysis in corpus linguistics. In M. Bieswanger & A. Koll-Stobbe (Eds.), New Approaches to the Study of Linguistic Variability (pp. 3–12). Peter Lang.
Zipf, G. K.
(1935) The Psycho-Biology of Language: An Introduction to Dynamic Philology. Houghton Mifflin.
Cited by (6)
Cited by 6 other publications
Ballance, Oliver J. & Averil Coxhead
2024. Corpus Analysis of Vocabulary. In The Encyclopedia of Applied Linguistics, ► pp. 1 ff.
Gillings, Mathew, Gerlinde Mautner & Paul Baker
2023. Corpus-Assisted Discourse Studies,
Malory, Beth
2023. Locating the ‘Age of Prescriptivism’ in Late Modern periodical reviews: a corpus-assisted discourse analytic approach. Journal of Historical Sociolinguistics 9:2 ► pp. 263 ff.
Jeaco, Stephen
2020. DIY Needs Analysis and Specific Text Types: Using The Prime Machine to Explore Vocabulary in Readymade and Homemade English Corpora. In Vocabulary in Curriculum Planning, ► pp. 199 ff.
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.