A text typology of social media
This paper introduces an initial text typology of social media posts from a multi-dimensional (MD) perspective.
Text types are “[g]roupings of text that are similar in their linguistic form” (
Biber
1989: 13). This text typology is based on a new MD analysis of social media messages presented in the paper. The corpus
consists of 60,000 social media messages in English compiled from Facebook, Twitter, Instagram, Reddit, Telegram, and YouTube.
After the texts were cleaned up, the corpus was tagged with the Biber Tagger and post-processed with the Biber Tag Count. Three
dimensions of variation were determined, each representing an underlying parameter of variation. Once the texts were scored on
each of the dimensions, a k-means cluster analysis was carried out, and the optimal number of clusters was determined using the
Cubic Clustering Criterion statistic. A two-way typology was developed based on the dimensional characteristics of each cluster
and on careful qualitative analysis of text samples.
Article outline
- 1.Introduction
- 2.Literature review
- 3.Methods
- 4.Dimensions of variation
- 4.1Dimension 1: Formal, prepared, informational communication
- 4.2Dimension 2: Informal, interactive, stance-marked discourse
- 4.3Dimensions 3: Expression of personal attitudes and feelings
- 4.4Text length
- 5.Text types
- 5.1Text type 1: Objective exposition
- 5.2Text type 2: Subjective expression
- 5.3Text type distribution
- 6.Discussion and final remarks
- Acknowledgements
- Note
-
References
References (43)
References
Adam, J. M. (2011). A linguística textual – Introdução à análise textual dos discursos [La linguistique textuelle. Introduction à l’analyse textuelle des discours] (M. D. G. Rodrigues, J. G. D. Silva Neto, L. Passeggi, & E. F. Leurquin, Trans.). São Paulo: Cortez.
Beaugrande, R. A. D., & Dressler, W. U. (1981). Introduction
to text
linguistics. London: Longman.
Berber Sardinha, T. (2014). Comparing
Internet and pre-Internet registers. In T. Berber Sardinha & M. Veirano Pinto (Eds.), Multi-dimensional
analysis, 25 years on: A tribute to Douglas
Biber (pp. 81–107). Amsterdam/Philadelphia: John Benjamins.
Berber Sardinha, T. (2017). Text
types in Brazilian Portuguese: A multidimensional
perspective. Corpora,
12
(3), 483–515.
Berber Sardinha, T. (2022). Corpus
linguistics and the study of social media: a case study using multi-dimensional
analysis. In A. O’Keeffe & M. McCarthy (Eds.), The
Routledge handbook of corpus
linguistics (pp. 656–674). New York: Routledge.
Berber Sardinha, T., Kauffmann, C., & Acunzo, C. M. (2014). Dimensions
of register variation in Brazilian Portuguese. In T. Berber Sardinha & M. Veirano Pinto (Eds.), Multi-dimensional
analysis, 25 years on: A tribute to Douglas
Biber (pp. 35–80). Amsterdam/Philadelphia: John Benjamins.
Berber Sardinha, T., & Shimazumi, M. (2021). A
text typology of argumentative essays based on the new ICLE v.3. Paper presented at
the 11th International Corpus Linguistics Conference 2021, Limerick,
Ireland.
Berber Sardinha, T., & Veirano Pinto, M. (Eds.). (2019). Multi-dimensional
analysis: Research methods and current
issues. London: Bloomsbury Academic.
Biber, D. (1988). Variation
across speech and writing. Cambridge: Cambridge University Press.
Biber, D. (1989). A
typology of English
texts. Linguistics,
27
1, 3–43.
Biber, D. (1993). Representativeness
in corpus design. Literary and Linguistic
Computing,
8
(4), 243–257.
Biber, D. (1995). Dimensions
of register variation – a cross-linguistic
comparison. Cambridge: Cambridge University Press.
Biber, D., & Egbert, J. (2018). Register
variation online. Cambridge: Cambridge University Press.
Biber, D., & Kurjian, J. (2007). Towards
a taxonomy of web registers and text types: a multi-dimensional
analysis. In M. Hundt, N. Nesselhauf, & C. Biewer (Eds.), Corpus
linguistics and the
web (pp. 109–132). Amsterdam / New York: Rodopi.
Bronckart, J. P. (1999). Atividades de linguagem, discursos e textos [Language activities,
discourses and texts] (A. R. Machado, Trans.). São Paulo: EDUC.
Charaudeau, P. (2009). Linguagem e discurso: Modos de organização [Langage et Discours –
Eléments de sémiolinguistique] (A. M. S. Correa, Trans.). São Paulo, SP: Contexto.
Clarke, I. (2020). Linguistic
variation across Twitter and Twitter trolling. (PhD
Dissertation). University of Birmigham, Birmingham.
Clarke, I. (2022). A
Multi-dimensional analysis of English tweets. Language and
Literature. Advance online publication.
Clarke, I., & Grieve, J. (2019). Stylistic
variation on the Donald Trump Twitter account: A linguistic analysis of tweets posted between 2009 and
2018. PLOS
ONE,
14
(9), e0222062.
Egbert, J., & Staples, S. (2019). Doing
multi-dimensional analysis in SPSS, SAS, and R. In T. Berber Sardinha & M. Veirano Pinto (Eds.), Multi-dimensional
analysis: Research methods and current
issues (pp. 125–144). London / New York: Bloomsbury Academic.
Fairchild, C. (2007). Building
the authentic celebrity: The ‘idol’ phenomenon in the attention economy. Popular Music and
Society,
30
(3), 355–375.
Friginal, E., & Hardy, J. A. (2014). Conducting
multi-dimensional analysis using SPSS. In T. Berber Sardinha & M. Veirano Pinto (Eds.), Multi-dimensional
analysis, 25 years on: A tribute to Douglas
Biber (pp. 298–316). Amsterdam/Philadelphia: John Benjamins.
Friginal, E., & Hardy, J. (2019). From
factors to dimensions: Interpreting linguistic co-occurrence
patterns. In T. Berber Sardinha & M. Veirano Pinto (Eds.), Multi-dimensional
analysis: Research methods and current
issues (pp. 145–164). London: Bloomsbury Academic.
Friginal, E., Waugh, O., & Titak, A. (2018). Linguistic
variation in Facebook and Twitter posts. In E. Friginal & J. A. Hardy (Eds.), Studies
in corpus-based
sociolinguistics (pp. 342–362). London: Routledge.
Goulart, L., & Wood, M. (2019). Methodological
synthesis of research using multi-dimensional analysis. Journal of Research Design and
Statistics in Linguistics and Communication
Science,
6
(2), 107–137.
Gray, B. (2019). Tagging
and counting linguistic features for multi-dimensional
analysis. In T. Berber Sardinha & M. Veirano Pinto (Eds.), Multi-dimensional
analysis: Research methods and current
issues (pp. 43–66). London / New York: Bloomsbury Academic.
Holgado-Tello, F. P., Chacon-Moscoso, S., Barbero-Garcia, I., & Vila-Abad, E. (2010). Polychoric
versus Pearson correlations in exploratory and confirmatory factor analysis of ordinal
variables. Quality &
Quantity,
44
1, 153–166.
Longacre, R. E. (1983). The
grammar of discourse. New York: Plenum Press.
Marwick, A. (2015). Instafame:
Luxury selfies in the attention economy. Public
Culture,
27
(1), 137–160.
McCulloch, G. (2019). Because
Internet: Understanding the new rules of language. New York: Riverhead Books.
O’Halloran, K. (2022). Posthumanism
and corpus linguistics. In A. O’Keeffe & M. McCarthy (Eds.), The
Routledge handbook of corpus
linguistics (pp. 675–692). New York: Routledge.
Prina Dutra, D., & Berber Sardinha, T. (2018). A
linguistic typology of sections in research articles: A multi-dimensional perspective. Paper
presented at the Arizona Corpus Linguistics Conference
(AZCL), Flagstaff, AZ, USA.
Sarle, W. S. (1983). Cubic
clustering criterion. Cary: SAS Institute Inc.
Shulman, D. (2017). The
presentation of self in contemporary social life. Los Angeles: Sage.
Tannen, D. (1982). Oral
and literate strategies in spoken and written
narratives. Language,
58
(1), 1–21.
Titak, A., & Roberson, A. (2013). Dimensions
of web registers: An exploratory multi-dimensional
comparison. Corpora,
8
(2), 235–260.
van der Goot, R. (2019). MoNoise:
A multilingual and easy-to-use lexical normalization tool. Proceedings of the 57th Annual
Meeting of the Association for Computational Linguistics: System
Demonstrations. ACL: Florence, pp. 201–206.
Werlich, E. (1983). A
text grammar of English. Heidelberg: Quelle & Meyer.
Cited by (1)
Cited by one other publication
Shakir, Muhammad & Dagmar Deuber
2023.
Compiling a corpus of South Asian online Englishes: A report, some reflections and a pilot study.
ICAME Journal 47:1
► pp. 119 ff.
This list is based on CrossRef data as of 26 december 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.