Corpus-based researchers and traditional qualitative researchers, such as those interested in critical discourse analysis, are often required to select prototypical texts for close reading that include the language features of interest that are present in a much larger corpus. Traditional approaches to this selection procedure have been largely ad hoc. In this paper, we offer a more principled way of selecting texts for close reading based on a ranking of texts in terms of the number of keywords they contain. To facilitate this analysis, we have developed a multiplatform, freeware software tool called ProtAnt that analyses the texts, generates a ranked list of keywords based on statistical significance and effect size, and then orders the texts by the number of keywords in them. We describe various experiments that demonstrate the ProtAnt analysis is effective not only at identifying prototypical texts, but also identifying outlier texts that may need to be removed from a target corpus.
(2013) Discourse Analysis and Media Attitudes: The Representation of Islam in the British Press. Cambridge, UK: Cambridge University Press.
Caldas-Coulthard, C.R., & van Leeuwen, T.
(2013) Teddy bear stories. In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 35–60). Los Angeles, CA: Sage. (Original work published 2003).
Chen, L., Guo, G., & Wang, K.
(2011) Class-dependent projection based method for text categorization. Pattern Recognition Letters, 32(10), 1493–1501.
(2013) Political discourse in the news: Democratizing responsibility or aestheticizing politics? In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 97–118). Los Angeles, CA: Sage. (Original work published 2000).
(1993) Generating and evaluating domain-oriented multi-word terms from texts. Information Processing and Management, 29(4), 433–447.
Durfee, A., Visa, A., Vanharanta, H., Schneberger, S., & Back, B.
(2007) Mining text with the Prototype-matching method. Information Resources Management Journal, 20(3), 19–31.
Ehrlich, S.Z., & Blum-Kulka, S.
(2013) Peer talk as a ‘double opportunity space’: The case of argumentative discourse. In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 145–168). Los Angeles, CA: Sage. (Original work published 2010).
Fayed, H.A., Hashem, S.R., & Atiya, A.F.
(2007) Self-generating prototypes for pattern classification. Pattern Recognition, 40(5), 1498–1509.
Gabrielatos, C., & Baker, P.
(2008) Fleeing, sneaking, flooding: A corpus analysis of discursive constructions of refugees and asylum seekers in the UK Press (1996-2005). Journal of English Linguistics, 36(1), 5–38.
(2013) If both opponents “extend hands in peace”, why don’t they meet? Mythic metaphors and cultural codes in the Israeli peace discourse. In R. Wodak, (Ed.). Critical Discourse Analysis Volume II: Methodologies (pp. 169–186). Los Angeles, CA: Sage. (Original work published 2010).
Kloptchenko, A., Back, B., Visa, A., Toivonen, J., & Vanharanta, H.
(2002) Toward content based retrieval from scientific text corpora. In
Proceedings of the 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS), Divnomorskoe, Russia
, 5-10 September 2002 (pp. 444–449). Washington, DC, USA: IEEE Computer Society.
Kloptchenko, A., Magnusson, C., Back, B., Visa, A., & Vanharanta, H.
(2004) Mining textual contents of financial reports. The International Journal of Digital Accounting Research, 4(7), 1–29.
(1973) The boundaries of words and their meanings. In J. Fishman (Ed.), New Ways of Analyzing Variation in English (pp. 340–73). Washington, DC: Georgetown University Press.
(2006) The curse and blessing of mobile phones: A corpus-based study into American and Polish rhetorical conventions. In A. Wilson, D. Archer & P. Rayson (Eds.), Corpus Linguistics around the World (pp. 141–151). London, UK: Rodopi.
Machin, D., & Suleman, U.
(2013) Arab and American computer war games: The influence of a global technology on discourse. In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 229–252). Los Angeles, CA: Sage. (Original work published 2006)
Manning, C.D., Raghavan, P., & Schutze, H.
(2008) An Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press.
(1975) Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104(3), 192–233.
(2013) Critical discourse analysis of news headline about Imran Khan’s peace march towards Wazaristan. Journal of Humanities and Social Science, 7(3), 18–24.
(2014) WordSmith Tools (Version 6) [Computer Software]. Liverpool, UK: Lexical Analysis Software. Retrieved from [URL] (last accessed May 2015).
van Leeuwen, T.
(1996) The representation of social actors. In C.R. Caldas Coulthard & M. Coulthard (Eds.), Texts and Practices (pp. 32–70). London, UK: Routledge.
Visa, A., Toivonen, J., Vanharanta, H., & Back, B.
(2001) Prototype matching: Finding meaning in the books of the bible. In
Proceedings of the 34th Annual Hawaii International Conference on System Sciences (HICSS-34), Hawaii, USA, 3-6 January 2001 (pp. 3002). Washington, DC, USA: IEEE Computer Society.
2018. Politicization of the refugee crisis?: a content analysis of parliamentary debates in Italy, the UK, and the EU. Italian Political Science Review/Rivista Italiana di Scienza Politica 48:1 ► pp. 85 ff.
2021. Artist’s statements, ‘how to guides’ and the conceptualisation of creative practice. English for Specific Purposes 62 ► pp. 103 ff.
2023. Using a Corpus-Assisted Discourse Studies Approach to Analyse Gender: A Case Study of German Radiology Reports. Gender a výzkum / Gender and Research 23:2 ► pp. 114 ff.
2020. Marriage for all (‘Ehe fuer alle’)?! A corpus-assisted discourse analysis of the marriage equality debate in Germany. Critical Discourse Studies 17:2 ► pp. 138 ff.
Lienen, Carmen Sarah & J. Christopher Cohrs
2021. Redefining the Meaning of Negative History in Times of Sociopolitical Change: A Social Creativity Approach. Political Psychology 42:6 ► pp. 941 ff.
2022. Book Review. Applied Corpus Linguistics 2:3 ► pp. 100034 ff.
2020. Discourses of teacher quality in the Australian print media 2014–2017: a corpus-assisted analysis. Discourse: Studies in the Cultural Politics of Education 41:6 ► pp. 854 ff.
Mockler, Nicole & Elizabeth Redpath
2022. Shoring Up “Teacher Quality”: Media Discourses of Teacher Education in the United Kingdom, United States, and Australia. In The Palgrave Handbook of Teacher Education Research, ► pp. 1 ff.
Mockler, Nicole & Elizabeth Redpath
2023. Shoring Up “Teacher Quality”: Media Discourses of Teacher Education in the United Kingdom, United States, and Australia. In The Palgrave Handbook of Teacher Education Research, ► pp. 933 ff.
2022. ‘Amber Alert’ or ‘Heatwave Warning’: The Role of Linguistic Framing in Mediating Understandings of Early Warning Messages about Heatwaves and Cold Spells. Applied Linguistics 43:2 ► pp. 227 ff.
Turner, Georgina, Sara Mills, Isabelle van der Bom, Laura Coffey-Glover, Laura L Paterson & Lucy Jones
2018. Opposition as victimhood in newspaper debates about same-sex marriage. Discourse & Society 29:2 ► pp. 180 ff.
Wang, Feng (Robin) & Philippe Humblé
2020. Readers’ perceptions of Anthony Yu’s self-retranslation ofThe Journey to the West. Perspectives 28:5 ► pp. 756 ff.
2023. The discursive construction of a conflict: a case of disputed islands in the East China Sea. Text & Talk 43:3 ► pp. 333 ff.
Zhang, Weiyu & Yin Ling Cheung
2022. The Hierarchy of News Values – A Corpus-Based Diachronic and Cross-Cultural Comparison of News Reporting on Epidemics. Journalism Studies 23:3 ► pp. 281 ff.
This list is based on CrossRef data as of 26 november 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.