The study of swearing has increased in the last decade, diversifying to include a wider range of data and methods of analysis. Nevertheless, certain types of data and specifically large corpora of computer mediated communication (CMC) have not been studied extensively. In this paper, we fill a gap in research by studying the use of swearwords in blog data, and illustrate ways of identifying swearing in a large corpus by taking context into account. This approach, based on the examination of shared and unique collocates of known expletives, facilitates the distinction of attestations of swearing from non-swearing in the case of polysemous lexemes, and the analysis of overlaps in usage and meaning of swearwords. This work therefore goes beyond basic sentiment analysis and offers new insights into the use of collocation for refining profanity filters, providing innovative perspectives on issues of growing importance as online interaction becomes more widespread.
Aijmer, K., & Rühlemann, C. (Eds.) (2014) Corpus Pragmatics. A Handbook. Cambridge: Cambridge University Press.
Andersson, L., & Trudgill, P. (1990). Bad Language. Oxford: Basil Blackwell.
Angouri, J., & Tseliga, T. (2010). “you HAVE NO IDEA WHAT YOU ARE TALKING ABOUT!” From e-disagreement to e-impoliteness in two online fora. Journal of Politeness Research, 6(1), 57–82.
Archer, D., Culpeper, J., & Davies, M. (2008). Pragmatic annotation. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook (pp. 613–641). Berlin: Mouton de Gruyter.
Beers Fägersten, K. (2012). Who’s Swearing Now? The Social Aspects of Conversational Swearing. Newcastle upon Tyne: Cambridge Scholars Publishing.
boyd, d. (2006). A blogger’s blog: Exploring the definition of a medium. Reconstruction, 6(4). Retrieved from [URL] (last accessed February 2016).
British National Corpus (BNC), XML Edition. (2007). Distributed by Oxford University Computing Services on behalf of the BNC Consortium.
Butler, C.W., & Fitzgerald, R. (2011). “My f***ing personality”: Swearing as slips and gaffes in live television broadcasts. Text & Talk, 31(5), 525–551.
Crystal, D. (1997). The Cambridge Encyclopedia of Language (2nd ed.). Cambridge: Cambridge University Press.
Firth, J.R. (1957). Papers in Linguistics 1934–1951. London: Oxford University Press.
Hardaker, C. (2010). Trolling in asynchronous computer-mediated communication: From user discussions to academic definitions. Journal of Politeness Research, 6(2), 215–242.
Haugh, M. (2010). When is an email really offensive?: Argumentativity and variability in evaluations of impoliteness. Journal of Politeness Research, 6(1), 7–31.
Herring, S.C., Scheidt, L.A., Wright, E., & Bonus, S. (2005). Weblogs as a bridging genre. Information Technology and People, 18(2), 142–171.
Hughes, G. (1998). Swearing: A Social History of Foul Language, Oaths and Profanity in English. Oxford: Blackwell.
Jay, T., & Janschewitz, K. (2008). The pragmatics of swearing. Journal of Politeness Research, 41, 267–88.
Jucker, Andreas H., Schreier, D., & Hundt, M. (2009). Corpus linguistics, pragmatics and discourse. In A.H. Jucker, D. Schreier & M. Hundt (Eds.), Corpora: Pragmatics and Discourse. Papers from the 29th International Conference on English Language Research on Computerized Corpora (ICAME 29) (pp. 3–9). Amsterdam: Rodopi.
Jucker, Andreas H. (2013). Corpus pragmatics. In J.-O. Östman & J. Verschueren (Eds.), Handbook of Pragmatics (pp. 1–18). Amsterdam: John Benjamins.
Kehoe, A. (2006). Diachronic linguistic analysis on the web using WebCorp. In A. Renouf & A. Kehoe (Eds.), The Changing Face of Corpus Linguistics (pp. 297–307). Amsterdam: Rodopi.
Kehoe, A., & Gee, M. (2007). New corpora from the web: Making web text more “text-like”. In P. Pahta, I. Taavitsainen, T. Nevalainen & J. Tyrkkö (Eds.), Studies in Variation, Contacts and Change in English 2: Towards Multimedia in Corpus Studies. VARIENG E-journal. Helsinki: University of Helsinki. Retrieved from [URL] (last accessed February 2016).
Kehoe, A., & Gee, M. (2012). Reader comments as an aboutness indicator in online texts: Introducing the Birmingham Blog Corpus. In S. Oksefjell Ebeling, J. Ebeling & H. Hasselgård (Eds.), Studies in Variation, Contacts and Change in English 12: Aspects of Corpus Linguistics: Compilation, Annotation, Analysis. Proceedings of ICAME 32, VARIENG E-journal. Helsinki: University of Helsinki. Retrieved from [URL] (last accessed February 2016).
Ljung, M. (2009). The functions of expletive interjections in spoken English. In A. Renouf & A. Kehoe (Eds.), Corpus Linguistics: Refinements & Reassessments (pp. 155–171). Amsterdam: Rodopi.
Ljung, M. (2011). Swearing. A Cross-cultural Linguistic Study. Basingstoke: Palgrave Macmillan.
McEnery, A. (2006). Swearing in English. Bad Language, Purity and Power from 1586 to the Present. London: Routledge.
McEnery, A., Baker, J.P., & Hardie, A. (2000a). Assessing claims about language use with corpus data – swearing and abuse. In J. Kirk (Ed.), Corpora Galore: Analyses and Techniques in Describing English (pp. 45–55). Amsterdam: Rodopi.
McEnery, A., Baker, J.P., & Hardie, A. (2000b). Swearing and abuse in Modern British English. In B. Lewandowska-Tomaszczyk & P.J. Melia (Eds.), PALC’99: Practical Applications in Language Corpora (pp. 37–48). Berlin: Peter Lang.
McEnery, A., & Xiao, Z. (2004). Swearing in Modern British English: The case of fuck in the BNC. Language and Literature, 13(3), 235–268.
Mishne, G., & Glance, N. (2006). Leave a reply: An analysis of weblog comments. Third Annual Workshop on the Weblogging Ecosystem (WWW 2006).
Mohr, M. (2013). Holy Shit. A Brief History of Swearing. Oxford: Oxford University Press.
Nardi, B.A., Schiano, D.J., Gumbrecht, M., & Swartz, L. (2004). Why we blog. Communications of the ACM, 47(12), 41–46.
Nigam, K., & Hurst, M. (2004). Towards a robust metric of opinion. In
Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text
. Retrieved from [URL] (last accessed February 2016).
Renouf, A. (1996). The ACRONYM project: Discovering the textual thesaurus. In I. Lancashire, C. Meyer & C. Percy (Eds.), Synchronic Corpus Linguistics: Papers from English Language Research on Computerized Corpora (ICAME 16) (pp. 171–187). Amsterdam: Rodopi.
Romero-Trillo, J. (Ed.) (2008). Pragmatics and Corpus Linguistics. A Mutualistic Entente. Berlin: Mouton de Gruyter.
Thelwall, M. (2008). “Fk yea I swear”: Cursing and gender in MySpace. Corpora, 3(1), 83–107.
Upadhyay, S.R. (2010). Identity and impoliteness in computer-mediated reader responses. Journal of Politeness Research, 6(1), 105–127.
Cited by (15)
Cited by 15 other publications
Abdel-Raheem, Ahmed
2024. Taboo metaphtonymy, gender, and impoliteness: how male and female Arab cartoonists think and draw. Social Semiotics 34:3 ► pp. 331 ff.
Abdel-Raheem, Ahmed
2024. The “menstruating” Muslim Brotherhood: taboo metaphor, face attack, and gender in Egyptian culture. Social Semiotics 34:2 ► pp. 151 ff.
Dynel, Marta
2023. Hashtag swearing: Pragmatic polysemy and polyfunctionality of #FuckPutin as solidary flaming. Journal of Pragmatics 209 ► pp. 108 ff.
Jucker, Andreas H. & Daniela Landert
2023. The diachrony of im/politeness in American and British movies (1930–2019). Journal of Pragmatics 209 ► pp. 123 ff.
Hsu, Chan-Chia
2022. When people do not want to talk anymore in online discussion boards: A corpus-based study of the multi-word expression bù shuō le ‘not talk anymore’ in Chinese. Discourse Studies 24:2 ► pp. 168 ff.
Coats, Steven
2021. ‘Bad language’ in the Nordics: profanity and gender in a social media corpus. Acta Linguistica Hafniensia 53:1 ► pp. 22 ff.
Love, Robbie
2021. Swearing in informal spoken English: 1990s–2010s. Text & Talk 41:5-6 ► pp. 739 ff.
Vidgen, Bertie, Leon Derczynski & Natalia Grabar
2020. Directions in abusive language training data, a systematic review: Garbage in, garbage out. PLOS ONE 15:12 ► pp. e0243300 ff.
Limatius, Hanna
2019. “I’m a fat bird and I just don’t care”: A corpus-based analysis of body descriptors in plus-size fashion blogs. Discourse, Context & Media 31 ► pp. 100316 ff.
2019. ‘Don’t say crap. Don’t use swear words.’ – Negotiating the use of swear/taboo words in the narrative mass media. Discourse, Context & Media 29 ► pp. 100293 ff.
2018. ‘I just found your blog’. The pragmatics of initiating comments on blog posts. Journal of Pragmatics 129 ► pp. 173 ff.
Lutzky, Ursula & Andrew Kehoe
2017. “I apologise for my poor blogging”: Searching for Apologies in the Birmingham Blog Corpus. Corpus Pragmatics 1:1 ► pp. 37 ff.
This list is based on CrossRef data as of 11 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.