Your blog is (the) shit
A corpus linguistic approach to the identification of swearing in computer mediated communication
The study of swearing has increased in the last decade, diversifying to include a wider range of data and methods of analysis. Nevertheless, certain types of data and specifically large corpora of computer mediated communication (CMC) have not been studied extensively. In this paper, we fill a gap in research by studying the use of swearwords in blog data, and illustrate ways of identifying swearing in a large corpus by taking context into account. This approach, based on the examination of shared and unique collocates of known expletives, facilitates the distinction of attestations of swearing from non-swearing in the case of polysemous lexemes, and the analysis of overlaps in usage and meaning of swearwords. This work therefore goes beyond basic sentiment analysis and offers new insights into the use of collocation for refining profanity filters, providing innovative perspectives on issues of growing importance as online interaction becomes more widespread.
Keywords: swearing, collocation, CMC, blogs, pragmatics
Published online: 08 September 2016
Aijmer, K., & Rühlemann, C.
Angouri, J., & Tseliga, T.
Archer, D., Culpeper, J., & Davies, M.
Beers Fägersten, K.
(2006) A blogger’s blog: Exploring the definition of a medium. Reconstruction, 6(4). Retrieved from http://www.danah.org/papers/ABloggersBlog.pdf (last accessed February 2016).
British National Corpus (BNC), XML Edition
(2007) Distributed by Oxford University Computing Services on behalf of the BNC Consortium.
Butler, C.W., & Fitzgerald, R.
Herring, S.C., Scheidt, L.A., Wright, E., & Bonus, S.
Jay, T., & Janschewitz, K.
Jucker, Andreas H., Schreier, D., & Hundt, M.
Jucker, Andreas H.
Kehoe, A., & Gee, M.
(2007) New corpora from the web: Making web text more “text-like”. In P. Pahta, I. Taavitsainen, T. Nevalainen & J. Tyrkkö (Eds.), Studies in Variation, Contacts and Change in English 2: Towards Multimedia in Corpus Studies. VARIENG E-journal. Helsinki: University of Helsinki. Retrieved from http://www.helsinki.fi/varieng/journal/volumes/02/kehoe_gee (last accessed February 2016).
(2012) Reader comments as an aboutness indicator in online texts: Introducing the Birmingham Blog Corpus. In S. Oksefjell Ebeling, J. Ebeling & H. Hasselgård (Eds.), Studies in Variation, Contacts and Change in English 12: Aspects of Corpus Linguistics: Compilation, Annotation, Analysis. Proceedings of ICAME 32, VARIENG E-journal. Helsinki: University of Helsinki. Retrieved from http://www.helsinki.fi/varieng/series/volumes/12/kehoe_gee/ (last accessed February 2016).
McEnery, A., Baker, J.P., & Hardie, A.
McEnery, A., & Xiao, Z.
Mishne, G., & Glance, N.
Nardi, B.A., Schiano, D.J., Gumbrecht, M., & Swartz, L.
Nigam, K., & Hurst, M.
(2004) Towards a robust metric of opinion. In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text . Retrieved from http://www.kamalnigam.com/papers/metric-EAAT04.pdf (last accessed February 2016).
Renouf, A., & Bauer, L.
Renouf, A., & Kehoe, A.
Cited by 9 other publications
Kopf, Susanne & Elena Nichele
Lutzky, Ursula & Matt Gee
Lutzky, Ursula & Andrew Kehoe
Vidgen, Bertie, Leon Derczynski & Natalia Grabar
This list is based on CrossRef data as of 27 july 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.