The influence of the benchmark corpus on keyword analysis
Punjaporn Pojanapunya | King Mongkut’s University of Technology Thonburi
Richard Watson Todd | King Mongkut’s University of Technology Thonburi
The growing popularity of keyword analysis as an applied linguistics methodology has not been matched by an increase in
the rigour with which the method is applied. While several studies have investigated the impact of choices made at certain stages of the
keyword analysis process, the impact of the choice of benchmark corpus has largely been overlooked. In this paper, we compare a target
corpus with several benchmark corpora and show that the keywords generated are different. We also show that certain characteristics of the
keyword list and of the keywords themselves vary in relatively predictable ways depending on the benchmark corpus. These variations have
implications for the choice of benchmark corpus and how the results of a keyword analysis should be interpreted. Analyzing the keywords from
a comparison with a large general corpus or the keyword lists from multiple comparisons may be most appropriate for register studies.
Keywords: keyword analysis, reference corpus, aboutness, register
Article outline
- 1.Introduction
- 1.1Uses of keyword analysis
- 1.2Conducting a keyword analysis
- 1.3Research into the effects of different benchmark corpora
- 2.Methodology
- 2.1The target corpus
- 2.2The benchmark corpora
- 2.3Generating the keyword lists
- 2.4Analysing the keyword lists and the keywords
- 3.Results
- 3.1Does the benchmark corpus influence the keywords?
- 3.2How does the benchmark corpus influence the keyword lists as a whole?
- 3.3How does the benchmark corpus influence the keywords?
- 4.Discussion
-
References
Published online: 03 June 2021
https://doi.org/10.1075/rs.19017.poj
https://doi.org/10.1075/rs.19017.poj
References
Archer, D., Wilson, A., & Rayson, P.
(2002) Introduction to the USAS category system. Retrieved from http://ucrel.lancs.ac.uk/usas/usas_guide.pdf.
Baker, P.
Biber, D., Conrad, S., & Reppen, R.
Bigi, B., Brun, A., Haton, J. P., Smaïli, K., & Zitouni, I.
(2001) A comparative study of topic identification on newspaper and e-mail. Proceedings of the 8th International Symposium on String Processing and Information Retrieval (pp. 238–241). Retrieved from https://hal.inria.fr/inria-00107535/document. 
Blaxter, T. T.
Camiciottoli, B. C.
Cselle, G., Albrecht, K., & Wattenhofer, R.
Culpeper, J.
Ferret, O., & Grau, B.
Gabrielatos, C., & Marchi, A.
(2012) Keyness: appropriate metrics and practical issues. Critical Approaches to Discourse Studies. Bologna. Retrieved from http://repository.edgehill.ac.uk/4196/1/Gabrielatos%26Marchi-Keyness-CADS2012.pdf.
Gardner, D.
Geluso, J., & Hirch, R.
Gerbig, A.
Gilmore, A., & Millar, N.
Goh, G. Y.
Harvey, K., Churchill, D., Crawford, P., Brown, B., Mullany, L., Macfarlane, A., & McPherson, A.
Hyland, K.
Jones, C., Byrne, S., & Halenko, N.
Kilgarriff, A., & Berber Sardinha, T.
Kotzé, E. F.
Loudermilk, B. C.
Meier, H. E., Rose, A., & Hölzen, M.
Meltzer, E. O., Wallace, D., Dykewicz, M., & Shneyer, L.
Nkechinyere, E. M., Andrew, I., & Idochi, O.
Paquot, M., & Bestgen, Y.
(2009) Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. Papers from the 29th International Conference on English Language Research on Computerized Corpora (ICAME 29) (pp. 247–269). Retrieved from http://hdl.handle.net/2078.1/76052. 
Pojanapunya, P.
(2017) A theory of keywords. (Doctoral dissertation). Retrieved from https://opac.lib.kmutt.ac.th/vufind/Record/1370763.
Pojanapunya, P., & Watson Todd, R.
Scharl, A., & Weichselbraun, A.
(2006) In search of a bad reference corpus. Paper presented at Word Frequency and Keyword Extraction: AHRC ICT Methods Network Expert Seminar on Linguistics., Lancaster University, UK. Retrieved from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.167.2638&rep=rep1&type=pdf.
Scott, M., & Tribble, C.
Swales, J.
Willis, R.