Geluso, Joe and Roz Hirch. 2019. The reference corpus matters. Comparing the effect of different reference corpora on keyword analysis. Register Studies 1 (2) : 209–242.
Article in journal
John Benjamins
This study investigates the effect that reference corpora of different registers have on the content of keyword lists. The study focusses on two target corpora and the keyword lists generated for each when using three distinct reference corpora. The two target corpora consist of published research by faculty at two PhD-granting programs in applied linguistics in North America. The reference corpora comprise published research in applied linguistics, newspaper and magazine articles, and fiction texts, respectively. The findings suggest that while common keywords representing each target corpus emerge regardless of the reference corpus used in the analysis, there are also substantial differences. Primarily, using a reference corpus of the same sub-register as the target corpus better highlights content unique to each target corpus while using a reference corpus of a different register better uncovers words that reflect the register that the target corpora represent. Implications for conducting keyword analysis are discussed.