This study investigates the effect that reference corpora of different registers have on the content of keyword
lists. The study focusses on two target corpora and the keyword lists generated for each when using three distinct reference
corpora. The two target corpora consist of published research by faculty at two PhD-granting programs in applied linguistics in
North America. The reference corpora comprise published research in applied linguistics, newspaper and magazine articles, and
fiction texts, respectively. The findings suggest that while common keywords representing each target corpus emerge regardless of
the reference corpus used in the analysis, there are also substantial differences. Primarily, using a reference corpus of the same
sub-register as the target corpus better highlights content unique to each target corpus while using a reference corpus of a
different register better uncovers words that reflect the register that the target corpora represent. Implications for conducting
keyword analysis are discussed.
Anthony, L. (2018). AntConc (3.5.6) [Computer Software]. Tokyo, Japan: Waseda University. Available from <[URL]>
Baker, P. (2004). Querying keywords: Questions of difference, frequency, and sense in keywords analysis. Journal of English Linguistics, 32(4), 346–359.
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.
Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 243–257.
Biber, D. & Conrad, S. (2009). Register, genre, and style. Cambridge: Cambridge University Press.
Biber, D., & Gray, B. (2016). Grammatical complexity in academic English: Linguistic change in writing. Cambridge: Cambridge University Press.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman grammar of spoken and written English. London: Longman.
Davies, M. (2008–). The Corpus of Contemporary American English (COCA): 560 million words, 1990-present. Available online at <[URL]>
Egbert, J. (2007). Quality Analysis of Journals in TESOL and Applied Linguistics. TESOL Quarterly, 41(1), 157–171.
Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor & A. Marchi (Eds.), Corpus approaches to discourse: A critical review (pp. 225–258). New York, NY: Routledge.
Gilmore, A., & Millar, N. (2018). The language of civil engineering research articles: A corpus-based approach. English for Specific Purposes, 511, 1017.
Gray, B. (2013). More than discipline: uncovering multi-dimensional patterns of variation in academic research articles. Corpora, 8(2), 153–181.
Hirch, R., & Geluso, J. (2017, October). Capturing ‘aboutness’: Comparing and contrasting three methods of keyword analysis. Paper presented at
Second Language Research Forum (SLRF), Ohio State University, Columbus, OH.
Hyland, K., & Jiang, F. (2018). “In this paper we suggest”: Changing patterns of disciplinary metadiscourse. English for Specific Purposes, 511, 18–30.
Jones, E., Oliphant, E., & Peterson, P. (2001–). SciPy: Open Source Scientific Tools for Python. <[URL]> (22August 2017).
Keynes, J. M. (1936). The general theory of employment, interest, and money. New York, NY: Harcourt and Brace. E-text available from The University of Adelaide Library Electronic Texts Collection. <[URL]>
Lijffijt, J., Nevalainen, T., Säily, Papapetrou, P., Puolamäki, K., & Mannila, H. (2016). Significance testing of word frequencies in corpora. Digital Scholarship in the Humanities, 31(2), 374–397.
Mahlberg, M. (2007). Clusters, key clusters and local textual functions. Corpora, 2(1), 1–31.
Paquot, M., & Bestgen, Y. (2009). Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. In A. Jucker, D. Schreier, & M. Hundt (Eds.), Corpora: Pragmatics and discourse (pp. 247–269). Amsterdam: Rodopi.
Pojanapunya, P., & Watson Todd, R. (2016). Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, 14(1), 133–167.
Rayson, P. (2008). Log-likelihood and effect size calculator. <[URL]> (22December 2017).
Stubbs, M. (2010). Three concepts of keywords. In M. Bondi & M. Scott (Eds.), Keyness in texts (pp. 21–42). Amsterdam: John Benjamins.
Swales, J. (1990). Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press.
Upton, G., & Cook, I. (2014). A dictionary of statistics (3rd ed.). Oxford: Oxford University Press.
van Raan, A. F. (2005). Measuring science. In H. F. Moed, W. Glänzel, & U. Schmoch (Eds.), Handbook of quantitative science and technology research (pp. 19–50). Dordrecht: Springer.
Xiao, R., & McEnery, A. (2005). Two approaches to genre analysis: Three genres in modern American English. Journal of English Linguistics, 33(1), 62–82.
Cited by (8)
Cited by eight other publications
Hashimoto, Brett & Kyra Nelson
2024. Recent trends in corpus design and reporting: A methodological synthesis. Research in Corpus Linguistics 12:1 ► pp. 59 ff.
Trnavac, Radoslava & Encarnacion Hidalgo Tenorio
2024. Breach of pacta sunt servanda: A corpus-assisted analysis of newspaper discourse on the AUKUS agreement. Applied Corpus Linguistics 4:3 ► pp. 100108 ff.
Kyröläinen, Aki-Juhani & Veronika Laippala
2023. Predictive keywords: Using machine learning to explain document characteristics. Frontiers in Artificial Intelligence 5
Rowson, Tatiana S., Sylvia Jaworska & Iwona Gibas
2023. Hot topic: Examining discursive representations of menopause and work in the British media. Gender, Work & Organization 30:6 ► pp. 1903 ff.
강, 소미, 하연 장 & 주연 장
2023. 코퍼스를 활용한 한국 사회 10년 비건 패션, 뷰티 변화 분석. Journal of the Korean Society of Clothing and Textiles 47:4 ► pp. 625 ff.
Karpenko-Seccombe, Tatyana
2021. Separatism: a cross-linguistic corpus-assisted study of word-meaning development in a time of conflict. Corpora 16:3 ► pp. 379 ff.
2020. Constructing Experts Without Expertise: Fiscal Reporting in the British Press, 2010–2016. Journalism Studies 21:15 ► pp. 2059 ff.
This list is based on CrossRef data as of 29 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.