The study of swearing has increased in the last decade, diversifying to include a wider range of data and methods of analysis. Nevertheless, certain types of data and specifically large corpora of computer mediated communication (CMC) have not been studied extensively. In this paper, we fill a gap in research by studying the use of swearwords in blog data, and illustrate ways of identifying swearing in a large corpus by taking context into account. This approach, based on the examination of shared and unique collocates of known expletives, facilitates the distinction of attestations of swearing from non-swearing in the case of polysemous lexemes, and the analysis of overlaps in usage and meaning of swearwords. This work therefore goes beyond basic sentiment analysis and offers new insights into the use of collocation for refining profanity filters, providing innovative perspectives on issues of growing importance as online interaction becomes more widespread.
(Eds.) (2014) Corpus Pragmatics. A Handbook. Cambridge: Cambridge University Press.
Andersson, L., & Trudgill, P
(1990) Bad Language. Oxford: Basil Blackwell.
Angouri, J., & Tseliga, T
(2010) “you HAVE NO IDEA WHAT YOU ARE TALKING ABOUT!” From e-disagreement to e-impoliteness in two online fora. Journal of Politeness Research, 6(1), 57–82.
Archer, D., Culpeper, J., & Davies, M
(2008) Pragmatic annotation. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook (pp. 613–641). Berlin: Mouton de Gruyter.
Beers Fägersten, K
(2012) Who’s Swearing Now? The Social Aspects of Conversational Swearing. Newcastle upon Tyne: Cambridge Scholars Publishing.
(2006) A blogger’s blog: Exploring the definition of a medium. Reconstruction, 6(4). Retrieved from [URL] (last accessed February 2016).
British National Corpus (BNC), XML Edition
(2007) Distributed by Oxford University Computing Services on behalf of the BNC Consortium.
Butler, C.W., & Fitzgerald, R
(2011) “My f***ing personality”: Swearing as slips and gaffes in live television broadcasts. Text & Talk, 31(5), 525–551.
(1997) The Cambridge Encyclopedia of Language (2nd ed.). Cambridge: Cambridge University Press.
(1957) Papers in Linguistics 1934–1951. London: Oxford University Press.
(2010) Trolling in asynchronous computer-mediated communication: From user discussions to academic definitions. Journal of Politeness Research, 6(2), 215–242.
(2010) When is an email really offensive?: Argumentativity and variability in evaluations of impoliteness. Journal of Politeness Research, 6(1), 7–31.
Herring, S.C., Scheidt, L.A., Wright, E., & Bonus, S
(2005) Weblogs as a bridging genre. Information Technology and People, 18(2), 142–171.
(1998) Swearing: A Social History of Foul Language, Oaths and Profanity in English. Oxford: Blackwell.
Jay, T., & Janschewitz, K
(2008) The pragmatics of swearing. Journal of Politeness Research, 41, 267–88.
Jucker, Andreas H., Schreier, D., & Hundt, M
(2009) Corpus linguistics, pragmatics and discourse. In A.H. Jucker, D. Schreier & M. Hundt (Eds.), Corpora: Pragmatics and Discourse. Papers from the 29th International Conference on English Language Research on Computerized Corpora (ICAME 29) (pp. 3–9). Amsterdam: Rodopi.
Jucker, Andreas H
(2013) Corpus pragmatics. In J.-O. Östman & J. Verschueren (Eds.), Handbook of Pragmatics (pp. 1–18). Amsterdam: John Benjamins.
(2006) Diachronic linguistic analysis on the web using WebCorp. In A. Renouf & A. Kehoe (Eds.), The Changing Face of Corpus Linguistics (pp. 297–307). Amsterdam: Rodopi.
Kehoe, A., & Gee, M
(2007) New corpora from the web: Making web text more “text-like”. In P. Pahta, I. Taavitsainen, T. Nevalainen & J. Tyrkkö (Eds.), Studies in Variation, Contacts and Change in English 2: Towards Multimedia in Corpus Studies. VARIENG E-journal. Helsinki: University of Helsinki. Retrieved from [URL] (last accessed February 2016).
Kehoe, A., & Gee, M
(2012) Reader comments as an aboutness indicator in online texts: Introducing the Birmingham Blog Corpus. In S. Oksefjell Ebeling, J. Ebeling & H. Hasselgård (Eds.), Studies in Variation, Contacts and Change in English 12: Aspects of Corpus Linguistics: Compilation, Annotation, Analysis. Proceedings of ICAME 32, VARIENG E-journal. Helsinki: University of Helsinki. Retrieved from [URL] (last accessed February 2016).
(2009) The functions of expletive interjections in spoken English. In A. Renouf & A. Kehoe (Eds.), Corpus Linguistics: Refinements & Reassessments (pp. 155–171). Amsterdam: Rodopi.
(2011) Swearing. A Cross-cultural Linguistic Study. Basingstoke: Palgrave Macmillan.
(2006) Swearing in English. Bad Language, Purity and Power from 1586 to the Present. London: Routledge.
McEnery, A., Baker, J.P., & Hardie, A
(2000a) Assessing claims about language use with corpus data – swearing and abuse. In J. Kirk (Ed.), Corpora Galore: Analyses and Techniques in Describing English (pp. 45–55). Amsterdam: Rodopi.
McEnery, A., Baker, J.P., & Hardie, A
(2000b) Swearing and abuse in Modern British English. In B. Lewandowska-Tomaszczyk & P.J. Melia (Eds.), PALC’99: Practical Applications in Language Corpora (pp. 37–48). Berlin: Peter Lang.
McEnery, A., & Xiao, Z
(2004) Swearing in Modern British English: The case of fuck in the BNC. Language and Literature, 13(3), 235–268.
Mishne, G., & Glance, N
(2006) Leave a reply: An analysis of weblog comments. Third Annual Workshop on the Weblogging Ecosystem (WWW 2006).
(2013) Holy Shit. A Brief History of Swearing. Oxford: Oxford University Press.
Nardi, B.A., Schiano, D.J., Gumbrecht, M., & Swartz, L
(2004) Why we blog. Communications of the ACM, 47(12), 41–46.
Nigam, K., & Hurst, M
(2004) Towards a robust metric of opinion. In
Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text
. Retrieved from [URL] (last accessed February 2016).
(1996) The ACRONYM project: Discovering the textual thesaurus. In I. Lancashire, C. Meyer & C. Percy (Eds.), Synchronic Corpus Linguistics: Papers from English Language Research on Computerized Corpora (ICAME 16) (pp. 171–187). Amsterdam: Rodopi.
2019. ‘Don’t say crap. Don’t use swear words.’ – Negotiating the use of swear/taboo words in the narrative mass media. Discourse, Context & Media 29 ► pp. 100293 ff.
2021. ‘Bad language’ in the Nordics: profanity and gender in a social media corpus. Acta Linguistica Hafniensia 53:1 ► pp. 22 ff.
2023. Hashtag swearing: Pragmatic polysemy and polyfunctionality of #FuckPutin as solidary flaming. Journal of Pragmatics 209 ► pp. 108 ff.
2022. When people do not want to talk anymore in online discussion boards: A corpus-based study of the multi-word expression bù shuō le ‘not talk anymore’ in Chinese. Discourse Studies 24:2 ► pp. 168 ff.
Jucker, Andreas H. & Daniela Landert
2023. The diachrony of im/politeness in American and British movies (1930–2019). Journal of Pragmatics 209 ► pp. 123 ff.
2019. “I’m a fat bird and I just don’t care”: A corpus-based analysis of body descriptors in plus-size fashion blogs. Discourse, Context & Media 31 ► pp. 100316 ff.
2021. Swearing in informal spoken English: 1990s–2010s. Text & Talk 41:5-6 ► pp. 739 ff.
Lutzky, Ursula & Matt Gee
2018. ‘I just found your blog’. The pragmatics of initiating comments on blog posts. Journal of Pragmatics 129 ► pp. 173 ff.
Lutzky, Ursula & Andrew Kehoe
2017. “I apologise for my poor blogging”: Searching for Apologies in the Birmingham Blog Corpus. Corpus Pragmatics 1:1 ► pp. 37 ff.
Vidgen, Bertie, Leon Derczynski & Natalia Grabar
2020. Directions in abusive language training data, a systematic review: Garbage in, garbage out. PLOS ONE 15:12 ► pp. e0243300 ff.
This list is based on CrossRef data as of 24 september 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.