Agreement mismatches in Dutch relatives
This paper investigates agreement mismatches in Dutch relatives. While the norm
is that singular neuter nouns occur with the relative pronoun
dat ‘that’, it is by now quite common to find neuter nouns
combining with the relative pronoun die. A large Twitter corpus
is used to study which linguistic variables make die ‘that’ in
this context more likely. Lack of agreement between neuter noun and relative
pronoun is very frequent in this corpus (37.5% of the cases, 46.8% if the
preceding determiner is indefinite). Non-agreement is most common for nouns that
are high in the animacy ranking, but it also occurs with other semantic classes,
and there is quite a bit of lexical variation. Young, female users have a
stronger tendency to use non-agreeing relative pronouns. Contrary to what
previous work suggests, we do not find that users with a Moroccan or Turkish
background have a stronger tendency towards non-agreement. A comparison of
tweets with agreeing and non-agreeing pronouns and a comparison of the Twitter
corpus with web data both suggest that non-agreement is characteristic of
informal language use.
Article outline
- 1.Introduction
- 2.Previous work
- 2.1Dutch nominal agreement
- 2.2Language use on social media
- 3.Corpus construction
- 4.Linguistic variation
- 5.Demographic variation
- 6.Formality
- 7.Conclusion
- Notes
-
References
References
Alis, Christian M., and May T. Lim
2013 “
Spatio-Temporal Variation of Conversational Utterances on Twitter”.
PLOS ONE 8 (10): e77793.
Argamon, Shlomo, Moshe Koppel, James W. Pennebaker, and Jonathan Schler
2007 “
Mining the Blogosphere: Age, Gender and the Varieties of Self-Expression”.
First Monday, 12 (9).
Audring, Jenny
2006 “
Pronominal Gender in Spoken Dutch”.
Journal of Germanic Linguistics 18 (2): 85–116.
Audring, Jenny
2009 Reinventing Pronoun Gender. PhD thesis Free University, Amsterdam.
Baayen, R. Harald
2001 Word frequency distributions. Springer.
Baldwin, Tim, Paul Cook, Marco Lui, Andrew MacKinlay, and Li Wang
2013 “
How Noisy Social Media Text, how dffrnt Social Media Sources”.
International Joint Conference on Natural Language Processing.
Bamman, David, Jacob Eisenstein, and Tyler Schnoebelen
2014 “
Gender Identity and Lexical Variation in Social Media”.
Journal of Sociolinguistics 18 (2): 135–160.
Barbiers, Sjef, Leonie Cornips, and Jan Pieter Kunst
2007 “
The Syntactic Atlas of the Dutch Dialects (sand): a Corpus of Elicited Speech and Text as an Online Dynamic Atlas. In
Creating and digitizing language corpora, ed. by
Joan Beal,
Karen Corrigan, and
Hermann Moisl, 54–90. Palgrave McMillan, New York.
Biemann, Chris, Felix Bildhauer, Stefan Evert, Dirk Goldhahn, Uwe Quasthoff, Roland Schäfer, Johannes Simon, Leonard Swiezinski, and Torsten Zesch
2013 “
Scalable Construction of High-Quality Web Corpora”.
Journal for Language Technology and Computational Linguistics 28 (2): 23–60.
Bouma, Gosse
2015 “
N-gram Frequencies for Dutch Twitter Data.”
Computational Linguistics in the Netherlands Journal 51: 25–36.
Brants, Thorsten, and Alex Franz
2009 Web 1T 5-gram, 10 European Languages Version 1 LDC2009T25. Linguistic Data Consortium, Philadelphia,
[URL].
Brysbaert, Marc, Michaël Stevens, Simon De Deyne, Wouter Voorspoels, and Gert Storms
2014 “
Norms of Age of Acquisition and Concreteness for 30,000 Dutch Words.”
Acta Psychologica 1501: 80–84.
Cornips, Leonie
2002 “
Ethnisch Nederlands.” In
Een buurt in beweging: talen en culturen in het Utrechtse Lombok en Transvaal, ed. by
H. Bennis,
G. Extra,
P. Muysken, and
J. Nortier, 285–302. Stichting Beheer IISG, Amsterdam.
Cornips, Leonie
2008 “
Loosing Grammatical Gender in Dutch: The Result of Bilingual Acquisition and/or an Act of Identity?”
International Journal of Bilingualism 12 (1–2): 105–124.
Cornips, Leonie, Mara van der Hoek, and Ramona Verwer
2006 “
The Acquisition of Grammatical Gender in Bilingual Child Acquisition of Dutch (by Older Moroccan and Turkish Children). The Definite Determiner, Attributive Adjective and Relative Pronoun.” In
Linguistics in The Netherlands. Amsterdam: John Benjamins.
De Decker, Benny, and Reinhild Vandekerckhove
2012 “
Stabilizing Features in Substandard Flemish: The Chat Language of Flemish Teenagers as a Test Case.”
Zeitschrift für Dialektologie und Linguistik 79 (2): 129–148.
De Vogelaer, Gunther, and Gert De Sutter
2011 “
The Geography of Gender Change: Pronominal and Adnominal Gender in Flemish Dialects of Dutch.”
Language Sciences 33 (1): 192–205.
De Vos, Lien, and Gunther De Vogelaer
2011 “
Dutch Gender and the Locus of Morphological Regularization.”
Folia Linguistica 45 (2): 245–281.
Eisenstein, Jacob
2013a “
What to Do about Bad Language on the Internet.” In
Proceedings of NAACL-HLT, Association for Computational Linguistics, Atlanta, 359–369.
Eisenstein, Jacob
2013b “
Phonological Factors in Social Media Writing.”
NAACL 2013, Association for Computational Linguistics, Atlanta, 11–19.
Eisenstein, Jacob, Brendan O’Connor, Noah A Smith, and Eric P Xing
2014 Diffusion of Lexical Change on Social Media.”
PLOS ONE 9 (1).
Geerts, Guido, Walter Haeseryn, Jaap de Rooij, and Maarten C. van den Toorn
1984 Algemene Nederlandse Spraakkunst. Groningen: Wolters-Noordhoff.
Gheuens, Koen
2012 “
Spelling op het internet; de chaos becijferd.”
Levende Talen 11: 26–35.
Hu, Yuheng, Kartik Talamadupula, and Subbarao Kambhampati
2013 “
Dude, srsly? The Surprisingly Formal Nature of Twitter’s Language.” In
7th international AAAI conference on web logs and social media (ICWS), Association for the Advancement of Artificial Intelligence.
Jurafsky, Dan, Victor Chahuneau, Bryan R. Routledge, and Noah A. Smith
2014 “
Narrative Framing of Consumer Sentiment in Online Restaurant Reviews.”
First Monday 19 (4).
Kraaikamp, Margot
2012 “
The Semantics of the Dutch Gender System.
Journal of Germanic Linguistics 24 (03): 193–232.
Labov, William
1972 Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press.
Labov, William
1990 “
The Intersection of Sex and Social Class in the Course of Linguistic Change.”
Language variation and change 2 (02): 205–254.
Lemmens, Maarten
2013 “
Van (neutraal) tussenwerpsel naar (positief) evaluatief adjectief: ça va en oké in het Nederlands.”
Internationale Linguistiek 11: 5–28.
Malvern, David, and Brian Richards
2012 “
Measures of Lexical Richness.”
The Encyclopedia of Applied Linguistics. Wiley Online.
Monroe, Burt L., Michael P Colaresi, and Kevin M Quinn
2008 “
Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict.”
Political Analysis 16 (4): 372–403.
Nguyen, Dong, Noah A Smith, and Carolyn P Rosé
2011 “
Author Age Prediction from Text Using Linear Regression.” In
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, 115–123. Atlanta: Association for Computational Linguistics.
Nguyen, Dong, Dolf Trieschnigg, and Theo Meder
2013 “
Tweetgenie: Development, Evaluation, and Lessons Learned.” In
ACM Sigweb Newsletter 4.
Nguyen, Dong, A Seza Doğruöz, Carolyn P Rosé, and Franciska de Jong
2015 “
Computational Sociolinguistics: A Survey.”
Computational Linguistics 42 (3): 537–593.
Oostdijk, Nelleke
2000 “
The Spoken Dutch Corpus: Overview and first evaluation.” In
Proceedings of LREC 2000, 887–894, Athens: European Language Resources Association.
Rao, Delip, David Yarowsky, Abhishek Shreevats, and Manaswi Gupta
2010 “
Classifying Latent User Attributes in Twitter.” In
Proceedings of the 2nd international workshop on Search and mining user-generated contents, 37–44. Association for Computing Machinery.
Tagliamonte, Sali A.
2011 Variationist Sociolinguistics: Change, Observation, Interpretation. Oxford: John Wiley & Son.
Tjong Kim Sang, Erik
2011 “
Het gebruik van Twitter voor taalkundig onderzoek.”
TABU: Bulletin voor Taalwetenschap 39 (1/2): 62–72.
Tjong Kim Sang, Erik, and Antal van den Bosch
2013 “
Dealing with Big Data: The Case of Twitter.”
Computational Linguistics in the Netherlands Journal 31: 121–134.
Unsworth, Sharon, and Aafke Hulk
2010 “
L1 Acquisition of Neuter Gender in Dutch: Production and Judgement.” In
Language acquisition and development: proceedings of GALA 2009. Cambridge: Cambridge Scholars.
van Halteren, Hans, and Nelleke Oostdijk
2014 “
Variability in Dutch Tweets. An Estimate of the Proportion of Deviant Word Tokens”.
Journal of Language Technology and Computational Linguistics 29 (2): 97–124.
van Noord, Gertjan
2006 “
At Last Parsing is Now Operational”. In
TALN06. Verbum Ex Machina. Actes de la 13e conference sur le traitement automatique des langues naturelles, ed. by
Piet Mertens,
Cedrick Fairon,
Anne Dister, and
Patrick Watrin, 20–42. Louvain: Presses Universitaires de Louvain.
van Noord, Gertjan, Gosse Bouma, Frank van Eynde, Daniel de Kok, Jelmer van der Linde, Ineke Schuurman, Erik Tjong Kim Sang, and Vincent Vandeghinste
2013 “
Large Scale Syntactic Annotation of Written Dutch: Lassy”. In
Essential Speech and Language Technology for Dutch: the STEVIN Programme, ed. by
Peter Spyns, and
Jan Odijk 147–164. Berlin: Springer.
Zaenen, Annie, Jean Carletta, Gregory Garretson, Joan Bresnan, Andrew Koontz-Garboden, Tatiana Nikitina, M Catherine O’Connor, and Tom Wasow
2004 “
Animacy Encoding in English: Why and How”. In
Proceedings of the 2004 ACL workshop on discourse annotation, 118–125. Atlanta: Association for Computational Linguistics.
Cited by
Cited by 1 other publications
De Vos, Lien, Gert De Sutter & Gunther De Vogelaer
2021.
Weighing Psycholinguistic and Social Factors for Semantic Agreement in Dutch Pronouns.
Journal of Germanic Linguistics 33:1
► pp. 30 ff.
This list is based on CrossRef data as of 27 march 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.