Article In:
Register Studies: Online-First ArticlesLinguistic variation beyond the Indo-European web
Analyzing Turkish web registers in TurCORE
A register, defined as a text variety with specific situational characteristics and a communicative purpose (Biber & Conrad 2019), is also recognized as a cultural construct (Biber & Egbert 2023). Registers merit thorough investigation due to their pivotal role in reflecting linguistic and cultural landscapes. However, existing studies predominantly focus on Indo-European languages. This study investigates Turkish web registers through the introduction of the Turkish Corpus of Online Registers (TurCORE). Comprising 2,780 web texts, TurCORE was manually annotated using a register taxonomy targeting the entire unrestricted web and identifying 24 web register categories. By employing Text Dispersion Keyword Analysis (Egbert & Biber 2019), the research examines the register characteristics with a specific focus on news reports, interactive discussions, and recipes, drawing comparisons with their English equivalents. Results reveal parallels between Turkish and English news reports while Turkish interactive discussions and recipes exhibit distinctive language- and culture specific features.
Keywords: web registers, Turkish, manual register annotation, Text Dispersion Keyword Analysis, linguistic analysis of web registers
Article outline
- 1.Introduction
- 2.Previous work
- 2.1Registers
- 2.2Turkish
- 3.Methodology
- 3.1Data and annotation process
- 3.2Keyword analysis
- 4.Findings: Registers of TurCORE
- 4.1Informational Persuasion (IP)
- 4.2Narrative (NA)
- 4.3Informational Description (IN)
- 4.4Opinion (OP)
- 4.5How-to/Instructions (HI)
- 4.6Interactive discussion
- 5.Findings: Lexical and grammatical analyses
- 5.1News reports
- 5.2Interactive discussion
- 5.3Recipes
- 6.Conclusion
- Acknowledgements
- Appendix: Abbreviations in Grammatical Annotations
- Author queries
-
References
This content is being prepared for publication; it may be subject to changes.
References (43)
Akbas, E. (2014). Are they discussing in the same way? Interactional metadiscourse in Turkish writers’ texts. In A. Łyda & K. Warchał (Eds.), Occupying niches: Interculturality, cross-culturality and aculturality in academic research (pp. 119–133). Springer International Publishing.
Alazzawie, A. (2022). The linguistic and situational features of WhatsApp messages among high school and university Canadian students. SAGE Open,
12
(1).
Arpaci, I., & Baloğlu, M. (2016). The impact of cultural collectivism on knowledge sharing among information technology majoring undergraduates. Computers in Human Behaviour,
56
1, 65–71.
Asheghi, N., Sharoff, S., & Markert, K. (2016). Crowdsourcing for web genre annotation. Language Resources and Evaluation,
50
(3), 603–641.
Ayçiçegi-Dinn, A., & Caldwell-Harris, C. (2011). Individualism–collectivism among Americans, Turks and Turkish immigrants to the U.S. International Journal of Intercultural Relations,
35
1, 9–16.
Baker, P. (2004). Querying keywords: Questions in difference, frequency, and sense in keyword analysis. Journal of English Linguistics,
32
(4), 346–359.
Barbaresi, A. (2021). Trafilatura: A web scraping library and command-line tool for text discovery and extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations (pp. 122–131).
Berber-Sardinha, T. (2018). Dimensions of variation across Internet registers. International Journal of Corpus Linguistics,
23
(2), 125–157.
(1995). Dimensions of register variation: A cross-linguistic perspective. Cambridge: Cambridge University Press.
(2012). Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory,
8
(1), 9–37.
Biber, D., & Egbert, J. (2016). Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics,
44
(2), 95–137.
Biber, D., & Conrad, S. (2019). Register, genre, and style (2nd ed). Cambridge: Cambridge University Press.
Biber, D., & Egbert, J. (2023). What is a register? Accounting for linguistic and situational variation within — and outside of — textual varieties. Register Studies,
5
(1), 1–22.
Can, T., & Cangir, H. (2019). A corpus-assisted comparative analysis of self-mention markers in doctoral dissertations of literary studies written in Turkey and the UK. Journal of English for Academic Purposes,
42
1, 1–14.
Can, H., & Hatipoğlu, Ç. (2023). Cultural conceptualization of congratulatory happy events in British English and Turkish: A cross-cultural perspective. Journal of Cognition and Culture,
23
(3), 289–309.
Candarli, D. (2022). Linguistic characteristics of online academic forum posts across subregisters, L1 backgrounds, and grades. Lingua,
267
1, 103190.
Egbert, J., Biber, D., & Davies, M. (2015). Developing a bottom-up, user-based method of web register classification. Journal of the Association for Information Science and Technology,
66
(9), 1817–1831.
Egbert, J., & Biber, D. (2019). Incorporating text dispersion into keyword analyses. Corpora,
14
(1), 77–104.
Erten, S. (2019). Corpus profiles of Turkish mental verbs with reference to Pattern Grammar and Corpus-Assisted Discourse Studies (Master thesis). Retrieved from [URL]
Gries, S. (2021). A new approach to (key) keyword analysis: Using frequency, and now also dispersion. Research in Corpus Linguistics,
9
(2), 1–33.
Kaya, E. K., & Yağlı, E. (2023). Recontextualization of the arguments of ‘innocence’ by a football club on Turkish newsprint media. Text & Talk.
Koçak, A. (2013). A comparative register analysis of the language of cooking used in Turkish recipes (Master thesis). Retrieved from [URL]
Laippala, V., Kyllönen, R., Egbert, J., Biber, D., & Pyysalo, S. (2019). Toward multilingual identification of online registers. In Proceedings of the 22nd Nordic Conference on Computational Linguistics (pp. 292–297). [URL]
Li, L., Li, A., Song, X., Li, X., Huang, K., & Ye, E. M. (2023). Characterizing response quantity on academic social Q&A sites: A multidiscipline comparison of linguistic characteristics of questions. Library Hi Tech,
41
(3), 921–938.
Liimatta, A. (2019). Exploring register variation on Reddit. A multi-dimensional study of language use on social media website. Register Studies,
1
(2), 269–295.
(2022). Do registers have different functions for text length? A case study of Reddit. Register Studies,
4
(2), 263–287.
Olfert, H. (2023). The concept of register in heritage language retention. Register Studies,
5
(1), 52–81.
Özyıldırım, I. (2011). A comparative register perspective on Turkish legislative language. In T. Salmi-Tolonen, I. Tukiainen, R. Foley (Eds.), Law and language in partnership and conflict. (pp. 79–94). Turku, Finland: Lapland Law Review.
Pomikalek, J. (2011). Removing boilerplate and duplicate content from web corpora (Doctoral dissertation), Masaryk University, Faculty of Informatics, Czech Republic.
Repo, L., Skantsi, V., Rönnqvist, S., Hellström, S., Oinonen, M., Salmela, A., Biber, D., Egbert, J., Pyysalo, S., & Laippala, V. (2021). Beyond the English web: Zero-shot cross-lingual and lightweight monolingual classification of registers. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, 183–191.
Scott, M., & Tribble, C. (2006). Textual patterns: Keywords and corpus analysis in language education. Amsterdam: John Benjamins.
Sharoff, S. (2021). Genre annotation for the Web: Text-external and text-internal perspectives. Register Studies,
3
(1), 1–32.
Skantsi, V., & Laippala, V. (2023). Analyzing the unrestricted Web: The Finnish corpus of online registers. Nordic Journal of Linguistics,
1
(1), 1–31.
Staples, S., Egbert, J., Biber, D., & Conrad, S. (2015). Register variation: A corpus approach. In D. Tannen, H. E. Hamilton & D. Schiffrin (Eds.), The handbook of discourse analysis (2nd ed) (pp. 505–525). Wiley Blackwell.
Taavitsainen, I. (2001). Middle English recipes: Genre characteristics, text type features and underlying traditions of writing. Journal of Historical Pragmatics,
2
(1), 85–113.