Same person, different platform
Challenges and implications for forensic authorship analysis. An exploratory study of Instagram and Twitter users
The importance of digital data in forensic contexts has been increasing continuously (e.g.,
Grant 2013;
Layton, Watters & Dazeley 2010;
Wright 2013), with individuals holding an average of 8.5 different social media accounts
in 2018 (
Statista 2021a). Even though numerous studies have investigated registers on
social media platforms (e.g.,
Seargeant & Tagg 2014;
Zappavigna 2013), it has rarely been attempted to describe individual styles of one and the same person
on different platforms – a research gap this paper attempts to address with the help of an exploratory hypothesis-generating
study. The data is drawn from Instagram and Twitter, and comprises 1,800 posts from three media representatives and/or writers
that hold accounts with both platforms. The results of the analysis suggest that the use of some features (e.g., emoji, hashtags)
is strongly influenced by the respective platforms, while other features (e.g., patterns of punctuation, use of types of speech
acts) remain stable and thus offer promising avenues for authorship analysis.
Article outline
- 1.Introduction
- 1.1Approaches to researching online behavior: Computer-mediated discourse (CMD)
- 1.2Registers of social media platforms
- 1.2.1Twitter
- 1.2.2Instagram
- 1.3Forensic authorship analysis & individual styles
- 2.Data
- 3.Methodology
- 3.1Genre, register, style
- 3.2Ethical considerations
- 3.3Limitations
- 4.Analysis
- 4.1Differences in individual style – Platform-dependent register
- 4.1.1DS – Structural level (post-final periods)
- 4.1.2OS – Level of meaning (interrogative speech acts)
- 4.1.3MY – Level of interaction & discourse (@mentions)
- 4.2Similarities in platform-independent individual style
- 4.2.1DS – Structural level (missing hyphens)
- 4.2.2MY – Level of meaning (commissive & directive speech acts)
- 4.2.3OS – Level of interaction & discourse (emoji)
- 5.Discussion
- 6.Conclusions
- Notes
-
References
References (106)
References
Al-Surmi, M. (2012). Authenticity
and TV shows: A multidimensional analysis perspective. TESOL
Quarterly,
46
(4), 671–694.
Ayers, J. W., Caputi, T. L., Nebeker, C., & Dredze, M. (2018). Don’t
quote me: Reverse identification of research participants in social media studies. Digital
Medicine,
1
(1). [URL]
Barber, C. (1962). Some
measurable characteristics of modern scientific prose. In F. Behre (Ed.), Contributions
to English syntax and
philology (pp. 21–43). Gothenborg: Almquist & Wiksell.
Barlas, G., & Stamatatos, E. (2020). Cross-domain
authorship attribution using pre-trained language models. In I. Maglogiannis, L. Iliadis & E. Pimenidis (Eds.), Artificial
intelligence applications and
innovations (pp. 255–266). Berlin: Springer.
Biber, D. (1988/1995). Variation
across speech and
writing. Cambridge: CUP.
Biber, D., & Conrad, S. (2009). Register,
genre,
style. Cambridge: CUP.
Biber, D., & Egbert, J. (2018). Register
variation
online. Cambridge: CUP.
boyd, D., Golder, S., & Lotan, G. (2010). Tweet,
tweet, retweet: Conversational aspects of retweeting on
Twitter. In Z. Papacharissi (Ed.), Networked
self: Identity, community, and culture on social network
sites (pp. 39–58). London: Routledge.
Brown, G., & Yule, G. (1983). Discourse
analysis. Cambridge: CUP.
Bruns, A., & Moe, H. (2014). Structural
layers of communication on Twitter. In K. Weller, A. Bruns, J. Burgess, M. Mahrt & C. Puschmann (Eds.), Twitter
and
society (pp. 15–28). New York: Peter Lang.
Clarke, I. (2019). Functional
linguistic variation in Twitter trolling. The International Journal of Speech, Language and the
Law,
26
(1), 57–84.
Clarke, I., & Grieve, J. (2019). Stylistic
variation on the Donald Trump Twitter account: A linguistic analysis of tweets posted between 2009 and
2018. PLoS
ONE,
14
(9), 1–27.
Constantinou, F., & Chambers, L. (2020). Non-standard
English in UK students’ writing over time. Language and
Education,
34
(1), 22–35.
Coulthard, M. (2004). Author
identification, idiolect and linguistic uniqueness. Applied
Linguistics,
25
(4), 431–447.
Coulthard, M., Johnson, A., & Wright, D. (2017). Introduction
to forensic
linguistics. London: Routledge.
Crystal, D. (2011). Internet
linguistics. London: Routledge.
Crystal, D., & Davy, D. (1969). Investigating
English
style. London: Routledge.
Daelemans, W., Kestemont, M., Manjavacas, E., Potthast, M., Rangel, F., Rosso, P., Specht, G., Stamatatos, E., Stein, B., Tschuggnall, M., Wiegmann, M., & Zangerle, E. (2019). Overview
of PAN 2019: Bots and gender profiling, celebrity profiling, cross-domain authorship attribution and style change
detection. Lecture Notes in Computer
Science,
11696
1, 402–416.
Dayter, D. (2015). Small
stories and extended narratives on Twitter. Discourse, Context and
Media,
10
1, 19–26.
De Beaugrande, R. A., & Dressler, W. U. (1981). Introduction
to text
linguistics. London: Longman.
Eckert, P., & Rickford, J. R. (Eds.) (2001). Style
and sociolinguistic
variation. Cambridge: CUP.
Fairclough, N. (1992). Discourse
and social
change. Cambridge: Politypress.
Fairclough, N. (1995). Media
discourse. London: Edward Arnold.
Fobbe, E. (2020). Text-linguistic
analysis in forensic authorship attribution. International Journal of Language &
Law,
9
1, 93–114.
Fobbe, E. (2022). Authorship
identification. In V. Guillén-Nieto & D. Stein (Eds.), Language
as Evidence: Doing forensic
linguistics (pp. 185–217). Cham: Palgrave MacMillan.
Gawne, L., & McCulloch, G. (2019). Emoji
as digital
gestures. Language@Internet,
17
1. [URL]
Goldstein-Stewart, J., Winder, R., & Sabin, R. (2009). Person
identification from text and speech genre samples. Proceedings of the 12th Conference of the
European Chapter of the ACL, 336–344.
Gomaa, W. H., & Fahmy, A. A. (2013). A
survey of text similarity approaches. International Journal of Computer
Applications,
68
(13), 13–18.
Grant, T. (2013). Txt
4n6: Method, consistency, and distinctiveness in the analysis of SMS text messages. Journal of
Law and
Policy,
21
(2), 467–494.
Grant, T. (2022). The
idea of progress in forensic authorship
analysis. Cambridge: CUP.
Grant, T., & MacLeod, N. (2018). Resources
and constraints in linguistic identity performance: A theory of authorship. Language and Law /
Linguagem e
Direito,
5
(1), 80–96.
Grant, T., & MacLeod, N. (2020). Language
and online
identities. Cambridge: CUP.
Halliday, M. (2014). Introduction
to Systemic Functional
Grammar. London: Routledge.
Hardaker, C., & McGlashan, M. (2016). “Real
men don’t hate women”: Twitter rape threats and group identity. Journal of
Pragmatics,
91
1, 80–93.
Herring, S. (2004). Computer-mediated
discourse analysis: An approach to researching online
behavior. In S. A. Barab, R. Kling & J. H. Gray (Eds.), Designing
for virtual communities in the service of
learning (pp. 338–376). New York: CUP.
Herring, S. (2007). A
faceted classification scheme for computer-mediated
discourse. Language@Internet [URL]
Hoey, M. (2005). Lexical
priming: A new theory of words and
language. London: Routledge.
Hogan, B. (2013). Pseudonyms
and the rise of the real-name web. In J. Hartley, J. Burgess & A. Bruns (Eds.), A
companion to new media
dynamics (pp. 290–308). London: Wiley.
Hood, S. (June 2013). Systemic
functional linguistics. Genre across borders. [URL]
Ishihara, S. (2017). Strength
of linguistic text evidence: A fused forensic text comparison system. Forensic Science
International,
278
1, 148–197.
Jiang, F. K., & Hyland, K. (2015). ‘The
fact that’: Stance nouns in disciplinary writing. Discourse
Studies,
17
(5), 529–550.
Johnson, A., & Wright, D. (2014). Identifying
idioloect in forensic authorship attribution: An n-gram textbite approach. Language and Law /
Linguagem e
Direito,
1
(1), 37–69.
Johansson, F., Kaati, L. & Shrestha, A. (2015). Timeprints for identifying social media users with multiple aliases. Security Informatics, 4(7), 1–11.
Juola, P. (2007). Future
trends in authorship attribution. In P. Craiger & S. Shenoi (Eds.), IFIP
(Vol. 242): Advances in digital
forensics (pp. 119–132). Boston: Springer.
Kemp, S. (2019). Digital
2019: Global internet usage accelerates. We are social. [URL]
Kestemont, M., Luyckx, K., Daelemans, W., & Crombez, T. (2012). Cross-genre
authorship verification using unmasking. English
Studies,
93
(3), 340–356.
Killian, A., Brounstein, T., Skryzalin, J., & Garcia, D. (2017). Stylometric
and temporal techniques for social media account resolution. Technical Report for Sandia
National Lab, 1–8.
Kocher, M., & Savoy, J. (2017). Distance
measures in author profiling. Information Processing and
Management,
53
1, 1103–1119.
Koppel, M., Schler, J., & Argamon, S. (2011). Authorship
attribution in the wild. Language Resources &
Evaluation,
45
1, 83–94.
Krieg-Holz, U., & Bülow, L. (2016). Linguistische
Stil- und
Textanalyse. Tübingen: Narr.
Larner, S. (2014). A
preliminary investigation into the use of fixed formulaic sequences as a marker of
authorship. International Journal of Speech, Language and the
Law,
21
(1), 1–22.
Layton, R., Watters, P., & Dazeley, R. (2010). Authorship
attribution for Twitter in 140 characters or
less. IEEEI, 1–8.
Leaver, T., Highfield, T., & Abidin, C. (2020). Instagram:
Visual social media
cultures. Cambridge: Politybooks.
MacLeod, N., & Grant, T. (2012). Whose
tweet? Authorship analysis of micro-blogs and other short-form
messages. In S. Tomblin, N. MacLeod, R. Sousa-Silva & M. Coulthard (Eds.), Proceedings
of the International Association of Forensic Linguists’ 10th Biennial
Conference (pp. 210–224). Birmingham: Aston University.
MacLeod, N., & Grant, T. (2017). “go
on cam but dnt be dirty”: Linguistic levels of identity assumption in undercover online operations against child sex
abusers. Language and Law / Linguagem e
Direito,
4
(2), 157–175.
Manovich, L. (2017). Instagram
and contemporary image. Manovich. [URL]
Marko, K. (2020). Exploring
the distinctiveness of emoji use for digital authorship analysis. Language and Law / Linguagem
e
Direito,
7
(1–2), 36–55.
Marko, K. (forthcoming). “You’re
a rockstar *heart eyes*” – what the functions of emoji reveal about the age and gender of their users on
Instagram. Language@Internet.
Marko, K., & Sullivan Buker, G. (2022). “Hope you’re in the mood for cookies”: An exploratory study of writing styles across
social media platforms. Journal of Indonesian Community for Forensic Linguistics, 1(1), 14–25.
Martin, J. R., & Rose, D. (2008). Genre
relations: Mapping
culture. London: Equinox.
McCulloch, G. (2019). Because
Internet. New York: Riverhead Books.
McMenamin, G. R. (2010). Forensic
stylistics. Theory and practice of forensic stylistics. In M. Coulthard & A. Johnson (Eds.), The
Routledge handbook of forensic
linguistics (pp. 487–507). New York: Routledge.
Myers, G. (2010). The
discourse of blogs and
wikis. London: Continuum.
Nini, A. (2017). Register
variation in malicious forensic texts. International Journal of Speech, Language and the
Law,
24
(1).
Nini, A. (2018). An
authorship analysis of the Jack the Ripper letters. Digital Scholarship in the
Humanities,
33
(3), 621–636.
Nini, A., & Grant, T. (2013). Bridging
the gap between stylistic and cognitive approaches to authorship analysis using Systemic Functional Linguistics and
multidimensional analysis. The International Journal of Speech, Language and the
Law,
20
(2), 173–202.
Orebaugh, A., & Allnutt, J. (2009). Classification
of instant messaging communications for forensic analysis. The International Journal of
Forensic Computer
Science,
1
1, 22–28.
Overdorf, R., Dutko, T., & Greenstadt, R. (2014). Blogs
and Twitter feeds: A stylometric environmental impact study. Proceedings of Privacy Enhancing
Technologies Symposium. [URL]
Overdorf, R., & Greenstadt, R. (2016). Blogs,
Twitter feeds, and Reddit comments: Cross-domain authorship attribution. Proceedings on Privacy
Enhancing
Technologies,
3
1, 155–171.
Page, R. (2011). Stories
and social media: Identities and
interaction. London: Routledge.
Page, R. (2012). The
linguistics of self-branding and micro-celebrity in Twitter: The role of hashtags. Discourse
&
Communication,
6
(2), 181–201.
Page, R., Barton, D., Unger, J. W., & Zappavigna, M. (2014). Researching
language and social
media. London: Routledge.
Peddinti, S. T., Ross, K. W., & Cappos, J. (2017). User
anonymity on Twitter. Sociotechnical Security and
Privacy, 84–57.
R Core Team (2021). R: A language
and environment for statistical computing. R Foundation for Statistical
Computing, Vienna, Austria. [URL]
Rheindorf, M., & Wodak, R. (2019). Genre-related
language change: Discourse- and corpus-linguistic perspectives on Austrian German
1970–2010. Folia
Linguistica,
53
(1), 125–167.
Rocha, A., Scheirer, W., Forstall, C., Cavalcante, T., Theophilo, A., Shen, B., Carvalho, A. R. B., & Stamatatos, E. (2016). Authorship
attribution for social media forensics. IEEE Transactions on Information Forensics and
Security,
12
(1), 1–30.
Scott, K. (2018). “Hashtags
work everywhere”: The pragmatic functions of spoken hashtags. Discourse, Context &
Media,
22
1, 57–64.
Seargeant, P. (2019). The
emoji revolution: How technology is shaping the future of
communication. Cambridge: CUP.
Seargeant, P., & Tagg, C. (2014). The
language of social media. London: Palgrave Macmillan.
Schoch, K. W. (2016). Case
study research. In G. J. Burkholder, K. A. Cox & L. M. Crawford (Eds.), The
scholar-practitioner’s guide to research
design (pp. 227–241). Baltimore: Laureate.
Siever, C. M., & Siever, T. (2019). Emoji-text
relations on Instagram. In C. M. Siever, T. Siever & H. Stöckl (Eds.), Shifts
toward image-centricity in contemporary multimodal
practices (pp. 177–203). London: Routledge.
Sinclair, J. M. (1991). Corpus,
concordance,
collocation. Oxford: OUP.
Sing, C. (2016). Writing
for specific purposes: Developing business students’ ability to
‘technicalize’. In S. Göpferich & I. Neumann (Eds.), Developing
and assessing academic and professional writing
skills (pp. 15–44). Frankfurt: Peter Lang.
Sousa Silva, R., Laboreiro, G., Sarmento, L., Grant, T., Oliveira, E., & Maia, B. (2011). ‘twazn
me!!! ;(‘Automatic authorship analysis of micro-blogging
messages. In R. Munoz, A. Monotoyo & E. Métais (Eds.), Natural
language processing and information
systems (pp. 161–168). Berlin: Springer.
Stamatatos, E. (2013). On
the robustness of authorship attribution based on character n-gram
features. Journal of Law and
Policy, 421–439.
Statista (2021a). Average number of social
media accounts per Internet user from 2013 to 2018. [URL]
Statista (2021b). Most popular social
networks worldwide as of January 2021. [URL]
Stone, B. (2009, November 19th). What’s
happening? Twitter. [URL]
Tagg, C. (2015). Exploring
digital
communication. London: Routledge.
Turell, M. T. (2010). The
use of textual, grammatical and sociolinguistic evidence in forensic text
comparison. International Journal of Speech, Language & the
Law,
17
(2), 211–250.
Twitter, Inc. (2022). About public and
protected Tweets. [URL]
Veum, A., & Undrum, L. V. M. (2018). The
selfie as a global discourse. Discourse &
Society,
29
(1), 86–103.
Wright, D. (2013). Stylistic
variation within genre conventions in the Enron email corpus. The International Journal of
Speech, Language and the
Law,
20
(1), 45–75.
Wright, D. (2021). Corpus
approaches to forensic linguistics. In M. Coulthard & R. Sousa-Silva (Eds.), The
Routledge handbook of forensic
linguistics (pp. 611–627). London: Routledge.
Zappavigna, M. (2013). Discourse
of Twitter and social
media. London: Bloomsbury.
Zhang, M. (2016). A
multidimensional analysis of metadiscourse markers across written registers. Discourse
Studies,
18
(2), 204–222.
Zote, J. (2021, July 26th). How
long should social posts be? Try this social media character counter. Sprout social. [URL]
Cited by (2)
Cited by two other publications
Marko, Karoline
2023.
Digital identity performance through emoji on the social media platform Instagram.
Frontiers in Communication 8
This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.