Degrees of non-standardness: Feature-based analysis of variation in a Torlak dialect corpus

Vuković, Teodora; Escher, Anastasia; Sonnenhauser, Barbara

doi:10.1075/ijcl.20014.vuk

Article published In:

International Journal of Corpus Linguistics
Vol. 27:2 (2022) ► pp.220–247

Degrees of non-standardness

Feature-based analysis of variation in a Torlak dialect corpus

Teodora Vuković | University of Zurich

Anastasia Escher | University of Zurich

Barbara Sonnenhauser | University of Zurich

A corpus-based method for assessing a range of dialect-standard variation is presented for identifying samples exhibiting the highest prevalence of dialect features. This method provides insight into areal and inter-speaker variation and allows the extraction of maximally non-standard manifestations of the dialect, which may then be sampled and used for the study of language change and variation. The focus is on a non-standard Torlak variety, which has undergone considerable change under the influence of standard Serbian. The degree of variation is assessed by measuring the frequencies of five distinguishing linguistic features: accent position, dative reflexive si, auxiliary omission in the compound perfect, the post-positive article, and analytic case marking in the indirect object and possessive. Locations subject to the greatest and least influence of the standard are revealed using hierarchical clustering. A positive correlation between the frequencies of occurrence reveals which non-standard feature is the best predictor of the others.

Keywords: linguistic variation, corpus-based dialectometry, endangered languages, spoken language, Torlak

Article outline

1.Introduction
2.Variation in Torlak
- 2.1Dimensions of variation
- 2.2Assessing variation
3.Torlak features chosen for analysis
- 3.1Selection
- 3.2Accent position
- 3.3The clitic si
- 3.4Omission of 3rd person auxiliary with l-perfect
- 3.5Post-positive article
- 3.6Analytic dative marking of the possessive and indirect object
- 3.7Operationalization
4.The Timok sample
5.Measuring variation
- 5.1Analysis
- 5.2Results
6.Discussion
7.Conclusion
Acknowledgements
Notes
References

Published online: 20 May 2022

https://doi.org/10.1075/ijcl.20014.vuk

References (46)

Alexander, R.

(1975) Torlak Accentuation. Otto Sagner.

Arsenijević, B.

(2012) Evaluative reflexions: Evaluative dative reflexive in Southeast Serbo-Croatian. In B. Fernandez & R. Etxepare (Eds.), Variation in Datives: A Microcomparative Perspective (pp. 1–21). Oxford University Press.

Belić, A.

(1905) Dijalekti istočne i južne Srbije [Dialects of Eastern and Southern Serbia]. Srpska Kraljevska Akademija.

Bruland, I., & Carr, P.

(2013) Variability, unconscious accent adaptation and sense of identity: The case of RP influences on speakers of Standard Scottish English. Language Sciences, 39 1, 151–155.

Chambers, J. K., & Trudgill, P.

(1998) Dialectology. Cambridge University Press.

Erjavec, T.

(2012) MULTEXT-East: Morphosyntactic resources for Central and Eastern European languages. Language Resources & Evaluation, 46 1, 131–142.

Escher, A.

(2021) Double argument marking in Timok dialect texts (in Balkan Slavic context). Zeitschrift für Slawistik, 66 (1), 61–90.

(2021) Auxiliary omission in the perfect tense in Timok. Balkanistica, 34 1, 41–64.

Frleta, T.

(2010) Uporaba i značenje nenaglašenog dativa povratne zamjenice u hrvatskom jeziku [The Use and the Meaning of Un-accentuated Dative Reflexive Pronoun in Croatian Language]. Jezik: časopis za kulturu hrvatskoga književnog jezika, 57 (1), 1–13.

Grickat, I.

(1954) O perfektu bez pomoćnog glagola u srpskohrvatskom jeziku [On the perfect tense without auxiliary in Serbo-Croatian Language]. Srpska Akademija Nauka.

Hinrichs, U.

(1999) Die sogenannten Balkanismen als Problem der Südosteuropa Linguistik und der Allgemeinen Sprachwissenschaft [The So-called Balkanisms as a Problem of Southeast European Linguistics and General Linguistics]. In U. Hinrichs (Ed.), Handbuch der Süosteuropa-Linguistik (pp. 42–463). Harrassowitz.

Ivanova, E. Y., & Gradinarova, A. A.

(2015) Sintaksicheskaya sistema bolgarskogo yazyka na fone russkogo [The Syntactic System of the Bulgarian Language on the Basis of the Russian Language]. Yazyki slavyanskoj kultury.

Ivić, P.

(1985) Dijalektologija srpskohrvatskog jezika. Uvod i štokavsko narečje [Dialectology of the Serbo-Croatian Language. Introduction and the Neo-Shtokavian dialects]. Matica srpska.

(2009) Srpski dijalekti i njihova klasifikacija [Serbian Dialects and Their Classification]. Izdavachka knizharnitza Zorana Stojanovicha.

Joseph, B.

(1992) The Balkan Languages. In W. Bright (Ed.), International Encyclopedia of Linguistics (Vol. 11, pp. 153–155). Oxford University Press.

Krstić, D.

(2014) Konstrukcija identiteta Torlaka u Srbiji i Bugarskoj [The Construction of the Torlak Identity in Serbia and Bulgaria] [Doctoral dissertation]. Univerzitet u Beogradu.

Lindstedt, J.

(2000) Linguistic balkanization: Contact-induced change by mutual reinforcement. In D. Gilbers, J. Nerbonne, & S. Schaeken (Eds.), Languages in Contact: Studies in Slavic and General Linguistics (pp. 231–246). Rodopi.

Ljubešić, N., Klubička, F., Agić, Ž., & Jazbec, I.

(2016) New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation: LREC 2016 (pp. 4264–4270). European Language Resources Association.

Meermann, A., & Sonnenhauser, B.

(2016) Das Perfekt im Serbischen zwischen Slavischer und Balkanslavischer Entwicklung [The Perfect in Serbian Between Slavic and Balkan Slavic Development]. In A. Bazhutkina & S. Barbara (Eds.), Linguistische Beiträge zur Slavistik. XXII. Jungslavistlnnen-Treffen in München. 12. Bis 14. September 2013 (pp. 83–110). Biblion.

Miličević Petrović, M., Vuković, T., Mirić, M., Konior, D., & Escher, A.

forthcoming). Language Documentation II: Towards a sociolinguistic corpus of Torlak. Challenges for data processing. Zeitschrift für Slavische Philologie. Winter.

Mišeska-Tomić, O.

(2004) The Balkan Sprachbund Properties. In O. Mišeska Tomić (Ed.), Balkan Syntax and Semantics (pp. 1–55). John Benjamins.

Mitkovska, L.

(2011) Competition between nominal possessive constructions and the possessive dative in Macedonian. In M. Nomachi (Ed.), The Grammar of Possessivity in South Slavic languages and Diachronic Perspective (pp. 83–109). Slavic Research Center.

Nerbonne, J., & Kretzschmar, W. A.

(2012) Dialectometry ++. LLC: Journal of Digital Scholarship in the Humanities, 28 (1), 2–12.

Petrova, G.

(2014) Medialny glagoly s refleksivna semantika [Medial Verbs with Reflexive Semantics]. Nauchny trudove na Rusenskyja universitet, 53 (6.3), 36–40.

Petrović, T.

(2015) Srbija i njen jug. “Južnjački dijalekti” između jezika, kulture I politike [Serbia and its South. “The Southern Dialects” between Language, Culture and Politics]. Fabrika knjiga.

Plotnikova, A. A.

(1996) Materialy dlja etnolinggvisticheskogo izuchenija balkanoslavjanskogo areala [Materials for the Ethnolinguistic study of the Balkan Slavic Area]. Institut slavjanovedenija RAN.

R Core Team

(2019) R: A language and environment for statistical computing (Version 3.6.2) [Computer software]. R Foundation for Statistical Computing. [URL]

Savova, D.

(2017) Glagoli s elementa si/sobie v balgarskija i v polskija ezik [Verbs with se/sobie elements in Bulgarian and Polish language]. Zeszyty Cyrylo-Metodiańskie, 6 1, 38–56.

Schmidt, T.

(2009) Creating and working with spoken language corpora in EXMARaLDA. In V. Lyding (Ed.), Proceedings of the Second Colloquium on Lesser Used Languages & Computer Linguistics (pp. 151–164). EURAC research.

Sikimić, B.

(2012) Timski terenski rad Balkanološkog instituta SANU. Razvoj istraživačkih ciljeva i metoda [Team Fieldwork of the Institute for Balkan Studies of SASA. The Development of the Research Goals and Methods]. In. M. Ivanović-Barušić (Ed.), Terenska istraživanja – poetika susreta (pp. 167–198). Etnografski institut SANU.

Szmrecsanyi, B.

(2015) Grammatical Variation in British English Dialects: A Study in Corpus-Based Dialectometry. Cambridge University Press.

(2017) Variationist sociolinguistics and corpus-based variationist linguistics: Overlap and cross-pollination potential. Canadian Journal of Linguistics/Revue canadienne de linguistique, 62 (4), 1–17.

Szmrecsanyi, B., & Wälchli, B.

(Eds.) (2014) Aggregating Dialectology, Typology and Register Analysis: Linguistic Variation in Text and Speech. Walter de Gruyter.

Sobolev, A. N.

(1998) Sprachatlas Ostserbiens und Westbulgariens. Bd. III. Texte [Linguistic Atlas of East Serbia and West Bulgaria. Volume III. Texts]. Biblion.

(2003) Malyj dialektologicheskij atlas balkanskih jazykov. Probnyj vypusk [Little Dialectological Atlas of the Balkan Languages. Trial issue]. Biblion.

Sobolev, A.

(2008) From synthetic to analytic case: Variation in South-Slavic dialects. In A. Malchukov, & A. Spencer (Eds.), The Oxford Handbook of Case (pp. 716–729). Oxford University Press.

Stanojević, M.

(1911) Severno-timočki dijalekat [The Northern Timok Dialect]. Srpski dijalektološki zbornik, 2 1, 360–463.

Stevanović, M.

(1986) Gramatika srpskog jezika [The Grammar of Serbian Language]. Naučna kniga.

Trudgill, P.

(1986) Dialects in Contact. Blackwell.

Van Rossum, G., & Drake, F. L.

(2009). Python 3 Reference Manual. CreateSpace.

Vuković, T.

(2019) Torlak ReLDI Tagger 2019 [Computer software]. Retrieved November 1, 2021, from [URL]

(2020) Spoken Torlak dialect corpus 1.0. CLARIN.SI. [URL]

(2021) Representing variation in a spoken corpus of an endangered dialect: The case of Torlak. Language Resources & Evaluation, 55 1, 731–756.

Vuković, T., Mirić, M., Escher, A., Ćirković, S., Miličević Petrović, M., Sobolev, A., & Sonnenhauser, B.

forthcoming). Under the magnifying glass. Dimensions of variation in the contemporary Timok variety. Zeitschrift für Slavische Philologie. Winter.

Vuković, T., & Samardžić, T.

(2018) Prostorna raspodela frekvencije postpozitivnog člana u timočkom govoru [Areal distribution of the frequency of the post-positive article in the Timok vernacular]. In S. Ćirković, A. N. Sobolev, B. Sonnenhauser, M. Miličević, & J. Pandurević, (Eds.), Timok. Folkloristička i lingvistička terenska istraživanja 2015–2017 (pp. 181–200). Narodna biblioteka “Njegoš”.

Wahlström, M.

(2015) The Loss of Case Inflection in Bulgarian and Macedonian. Slavica Helsingiensia 47 [Doctoral dissertation, University of Helsinki]. University of Helsinki, Department of Modern Languages. [URL]