This paper investigates the contribution of author/idiolect vs. register/type-of-text – as the most salient factors
influencing the final shape of a text – towards explaining the variation observed in Czech texts. Since it is almost impossible to explore
the effect of these factors on authentic data, we used elicited letters collected in a fully crossed experimental design (representative
sample of 200 authors × four elicitation scenarios serving as a proxy to register variation). The variation encompassed by the elicited
texts is analyzed through the lens of a general-purpose multi-dimensional model of Czech. Using triangulation via three established
statistical methods and one devised for the purpose of this study, we find that register matters a great deal, explaining 1.5 times as much
variation overall as idiolect. This should be taken into account when designing research in sociolinguistics or variation studies in
general.
Amoroso, L. W. (2018). Analyzing group differences. In A. Phakiti, P. D. Costa, L. Plonsky, & S. Starfield (Eds.), The Palgrave Handbook of Applied Linguistics Research Methodology (pp. 501–521). Palgrave Macmillan.
Baayen, H., van Halteren, H., & Tweedie, F. (1996). Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing, 11(3), 121–132.
Bakeman, R. (2005). Recommended effect size statistics for repeated measures designs. Behavior Research Methods, 37(3), 379–384.
Baker, P. (2010). Sociolinguistics and Corpus Linguistics. Edinburgh University Press.
Baker, P., & Egbert, J. (2016). Triangulating Methodological Approaches in Corpus Linguistic Research. Routledge.
Bayley, R., Cameron, R., & Lucas, C. (Eds.). (2013). The Oxford Handbook of Sociolinguistics. Oxford University Press.
Biber, D. (1988). Variation Across Speech and Writing. Cambridge University Press.
Biber, D. (1995). Dimensions of Register Variation: A Cross-Linguistic Comparison. Cambridge University Press.
Biber, D. (2012). Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory, 8(1), 9–37.
Biber, D., & Conrad, S. (2009). Register, Genre, and Style. Cambridge University Press.
Biber, D., & Finegan, E. (Eds.). (1994). Sociolinguistic Perspectives on Register. Oxford University Press.
Čermák, F. (Ed.). (2007). Slovník Karla Čapka [Karel Čapek՚s Dictionary]. Nakladatelství Lidové noviny.
Český statistický úřad [Czech Statistical Office]. (2015). Věk a vzdělání populace [Age and education of the population]. [URL]
Conrad, S. (2015). Register variation. In D. Biber, & R. Reppen (Eds.), The Cambridge Handbook of English Corpus Linguistics (pp. 309–329). Cambridge University Press.
Cvrček, V., Komrsková, Z., Lukeš, D., Poukarová, P., Řehořková, A., & Zasina, A. J. (in preparation). Register variability of elicited texts.
Cvrček, V., Komrsková, Z., Lukeš, D., Poukarová, P., Řehořková, A., & Zasina, A. J. (2018a). From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA. Corpus Linguistics and Linguistic Theory. Advance online publication.
Cvrček, V., Komrsková, Z., Lukeš, D., Poukarová, P., Řehořková, A., & Zasina, A. J. (2018b). Variabilita češtiny: Multidimenzionální analýza [Variability of Czech: A multi-dimensional analysis]. Slovo a slovesnost, 79(4), 293–321.
Cvrček, V., Komrsková, Z., Lukeš, D., Poukarová, P., Řehořková, A., Zasina, A. J., & Benko, V. (2020). Comparing web-crawled and traditional corpora. Language Resources and Evaluation, 541, 713–745.
Eckert, E. (Ed.). (1993). Varieties of Czech: Studies in Czech Sociolinguistics. Rodopi.
Egbert, J., & Baker, P. (2019). Using Corpus Methods to Triangulate Linguistic Analysis. Taylor & Francis.
Fairclough, N. (2003). Analysing Discourse: Textual Analysis for Social Research. Routledge.
Finegan, E., & Rickford, J. R. (Eds.). (2004). Language in the USA: Themes for the 21st Century. Cambridge University Press.
Grant, T. (2007). Quantifying evidence in forensic authorship analysis. International Journal of Speech, Language and the Law, 14(1), 1–25.
Grice, J. W. (2001). Computing and evaluating factor scores. Psychological Methods, 6(4), 430–450.
Hinrichs, L., & Szmrecsanyi, B. (2007). Recent changes in the function and frequency of Standard English genitive constructions: A multivariate analysis of tagged corpora. English Language & Linguistics, 11(3), 437–474.
Hnátková, M. (2002). Značkování frazémů a idiomů v Českém národním korpusu s pomocí Slovníku české frazeologie a idiomatiky [Tagging phraseological units and idioms in the Czech National Corpus with the aid of the Dictionary of Czech phraseology and idiomatics]. Slovo a slovesnost, 63(2), 117–126.
Iwasaki, S., & Horie, P. I. (2000). Creating speech register in Thai conversation. Language in Society, 29(4), 519–554.
Jelínek, T. (2008). Nové značkování v Českém národním korpusu [New tagging in the Czech National Corpus]. Naše řeč, 91(1), 13–20.
King, B. M., Rosopa, P. J., & Minium, E. W. (2010). Some (almost) assumption-free tests. In Statistical Reasoning in the Behavioral Sciences (6th ed., pp. 381–401). Wiley.
Krejci, B., & Hilton, K. (2017). There’s three variants: Agreement variation in existential there constructions. Language Variation and Change, 29(2), 187–204.
Kučera, D. (2017). Computational psycholinguistic analysis of Czech text and the CPACT research. In ISC SGEM 4th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2017: Science & Society Conference Proceedings, (pp. 77–84). ISC SGEM.
Kučera, D., & Havigerová, J. M. (2015). Computational psycholinguistic analysis and its application in psychological assessment of college students. Journal of Pedagogy, 6(1), 61–72.
Labov, W. (1966). The Social Stratification of English in New York City. Center for Applied Linguistics.
Louwerse, M. M. (2004). Semantic variation in idiolect and sociolect: Corpus linguistic evidence from literary texts. Computers and the Humanities, 38(2), 207–221.
McMenamin, G. R. (2002). Forensic Linguistics: Advances in Forensic Stylistics. CRC Press.
Milroy, L., & Gordon, M. (2003). Sociolinguistics: Models and Methods. Blackwell.
Nakagawa, S., Johnson, P. C. D., & Schielzeth, H. (2017). The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of the Royal Society, Interface, 14(134).
Olsson, J. (2008). Forensic Linguistics (2nd ed.). Continuum.
Page, N. (2011). The Language of Jane Austen. Routledge.
Petkevič, V. (2014). Problémy automatické morfologické disambiguace češtiny [Problems of automatic morphological disambiguation of Czech]. Naše řeč, 97(4–5), 194–207.
Rickford, J. R., & McNair-Knox, F. (1994). Addressee- and topic-influenced style shift: A quantitative sociolinguistic study. In D. Biber & E. Finegan (Eds.), Sociolinguistic Perspectives on Register (pp. 235–276). Oxford University Press.
Riordan, B. (2007). There’s two ways to say it: Modeling nonprestige there’s. Corpus Linguistics and Linguistic Theory, 3(2), 233–279.
Spoustová, D., Hajič, J., Votrubec, J., Krbec, P., & Květoň, P. (2007). The best of two worlds: Cooperation of statistical and rule-based taggers for Czech. In J. Piskorski & T. Hristo (Eds.), Proceedings of the Workshop on Balto-Slavonic Natural Language Processing (pp. 67–74). Association for Computational Linguistics. [URL]
Staples, S., Biber, D., & Reppen, R. (2018). Using corpus-based register analysis to explore the authenticity of high-stakes language exams: A register comparison of TOEFL iBT and disciplinary writing tasks. The Modern Language Journal, 102(2), 310–332.
Straková, J., Straka, M., & Hajič, J. (2013). A new state-of-the-art Czech named entity recognizer. In I. Habernal, & V. Matoušek (Eds.), Text, Speech, and Dialogue (pp. 68–75). Springer.
Straková, J., Straka, M., & Hajič, J. (2014). Open-source tools for morphology, lemmatization, POS tagging and named entity recognition. In K. Bontcheva & J. Zhu (Eds.), Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 13–18). Association for Computational Linguistics.
Szmrecsanyi, B. (2005). Language users as creatures of habit: A corpus-based analysis of persistence in spoken English. Corpus Linguistics and Linguistic Theory, 1(1), 113–150.
Tagliamonte, S. (1998). Was/were variation across the generations: View from the city of York. Language Variation and Change, 10(2), 153–191.
Tambouratzis, G., Markantonatou, S., Hairetakis, N., Vassiliou, M., Tambouratzis, D., & Carayannis, G. (2000). Discriminating the registers and styles in the Modern Greek language. In A. Kilgarriff & T. Berber Sardinha (Eds.), Proceedings of the Workshop on Comparing Corpora – Volume 9 (pp. 35–42). Association for Computational Linguistics.
Trudgill, P. (2004). Dialects (2nd ed.). Routledge.
Zasina, A. J., Lukeš, D., Komrsková, Z., Poukarová, P., & Řehořková, A. (2018). Koditex: Korpus diverzifikovaných textů [Koditex: Corpus of diversified texts] (version 1). Ústav Českého národního korpusu FF UK. [URL]
Cited by (5)
Cited by five other publications
Cvrček, Václav, Zuzana Laubeová, David Lukeš, Petra Poukarová, Anna Řehořková & Adrian Jan Zasina
2023. Epistemic stance in written L2 English: The role of task type, L2 proficiency, and authorial style. Applied Corpus Linguistics 3:1 ► pp. 100040 ff.
Kučera, Dalibor, Jiří Haviger & Jana M. Havigerová
2022. Personality and Word Use: Study on Czech Language and the Big Five. Journal of Psycholinguistic Research 51:5 ► pp. 1165 ff.
Kučera, Dalibor & Matthias R. Mehl
2022. Beyond English: Considering Language and Culture in Psychological Text Analysis. Frontiers in Psychology 13
This list is based on CrossRef data as of 5 august 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.