Chapter 9
Are referent introductions sensitive to forward planning in discourse?
Evidence from Multi-CAST
It has been argued that speakers employ morphosyntactic structures such as presentationals and left-dislocations (Lambrecht 1994) to establish new entities in discourse due to considerations of referent accessibility vis-à-vis event processing (Du Bois 1987; Chafe 1987). We here investigate whether introductions are sensitive to the salience of the discourse referent in subsequent discourse (Himmelmann 1996; Lichtenberk 1996). This hypothesis is tested against spoken corpus data from twelve diverse languages. While the use of specific morphosyntactic structures does correlate with discourse prominence, humanness has a much stronger effect. Subsequent discourse salience is hence not the chief determinant of the syntactic positions of new mentions; the convergence of humanness and semantic role associations in specific syntactic positions better explains the attested patterns.
Article outline
- 1.Introduction
- 2.Background: Building a multilingual corpus for typological research in discourse and grammar
- 2.1Multi-CAST corpus building and corpus composition
- 2.2Multi-CAST corpus annotations
- 3.Case study: Patterns of referent introduction vis-à-vis their discourse salience
- 3.1Referent introduction as a challenge to processing
- 3.2Establishing new referents for subsequent discourse
- 3.2.1Role of introductions by frequency in immediate subsequent discourse
- 3.2.2Role of introductions by overall discourse frequency
- 3.2.3Role of introductions by humanness
- 3.2.4Discussion
- 4.Summary and conclusions
-
Notes
-
Abbreviations
-
References
References (87)
References
Adibifar, Shirin. 2016. Multi-CAST Persian. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Andrews, Avery. 2007. The major functions of the noun phrase. In Language Typology and Syntactic Description, Timothy Shopen (ed), 132–223. Cambridge: CUP.
Ariel, Mira. 2008. Pragmatics and grammar. Cambridge: CUP.
Arnold, Jennifer E. 2008. Reference production. Language and Cognitive Processes 23(4): 495–527.
Arnold, Jennifer E., Fagnano, Maria & Tanenhaus, Michael K. 2003. Disfluencies signal theee, um, new information. Journal of Psycholinguistic Research 32(1): 25–36.
Arnold, Jennifer E., Tanenhaus, Michael K., Altmann, Rebecca J. & Fagnano, Maria. 2004. The old and thee, uh, new. Psychological Science 15(9): 578–582.
Barth, Danielle & Schnell, Stefan. 2022. Understanding Corpus Linguistics. London: Routledge.
Barth, Danielle & Evans, Nicholas. 2017. SCOPIC design and overview. In The Social Cognition Parallax Interview Corpus (SCOPIC): A Cross-linguistic Resource, Danielle Barth & Nicholas Evans (eds), 1–23. Honolulu HI: University of Hawai’i Press.
Barth, Danielle, Evans, Nicholas, Arka, I Wayan, Bergqvist, Henrik, Forker, Diana, Gipper, Sonja, Hodge, Gabrielle, Kashima, Eri, Kasuga, Yuki, Kawakami, Carine, Kimoto, Yukinori, Knuchel, Dominique, Kogura, Norikazu, Kurabe, Keita, Mansfield, John, Narrog, Heiko, Putu Eka Pratiwi, Desak, van Putten, Saskia, Senge, Chikako & Tykhostup, Olena. 2021. Language vs. individuals in cross-linguistic corpus typology. In Doing Corpus-based Typology with Spoken Language Corpora: State of the Art, Geoffrey Haig, Stefan Schnell & Frank Seifart (eds), 179–232. Honolulu HI: University of Hawai’i Press.
Bentz, Christian & Ferrer-i-Cancho, Ramon. 2016. Zipf’s law of abbreviation as a language universal. In Proceedings of the Leiden Workshop on Capturing Phylogenetic Algorithms for Linguistics, Christian Bentz, Gerhard Jäger & Yanovich, Igor (eds). Tübingen: University of Tübingen.
Bickel, Balthasar. 2003. Referential density in discourse and syntactic typology. Language 79(4): 708–736.
Bogomolova, Natalia, Ganenkov, Dimitry & Schiborr, Nils N. 2021. Multi-CAST Tabasaran. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Bresnan, Joan, Dingare, Shirpa & Manning, Christoper D. 2001. Soft constraints mirror hard constraints: Voice and person in English and Lummi. In Proceedings of the LFG 01 Conference, Miriam Butt & Tracy H. King (eds), 13–32. Stanford CA: CSLI.
Brickell, Timothy. 2016. Multi-CAST Tondano. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Brickell, Timothy & Schnell, Stefan. 2017. Do grammatical relations reflect information status? Reassessing preferred argument structure against discourse data from Tondano. Linguistic Typology 21(1): 177–209.
Chafe, Wallace. 1980. The Pear Stories: Cognitive, Cultural, and Linguistic Aspects of Narrative Production. Norwood NJ: Ablex.
Croft, William. 2010. The origins of grammaticalization in the verbalization of experience. Linguistics 48(1): 1–48.
Davies, Mark. 2008. The Corpus of Contemporary American English (COCA). [URL] (23 July 2022).
Dingemanse, Mark, Rossi, Giovanni & Floyd, Simeon. 2017. Place reference in story beginnings: A cross-linguistic study of narrative and interactional affordances. Language in Society 46(2): 129–158.
Du Bois, John. 1987. The discourse basis of ergativity. Language 63(4): 805–855.
Duranti, Alessandro. 1994. From Grammar to Politics: Linguistic Anthropology in a Western Samoan village. Berkley CA: University of California Press.
Durie, Mark. 2003. New light on information pressure. In Preferred Argument Structure [Studies in Discourse and Grammar 14], John Du Bois, Lorraine Kumpf & William J. Ashby (eds), 159–196. Amsterdam: John Benjamins.
Evans, Nicholas & Levinson, Stephen C. 2009. The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and Brain Sciences 32(5): 429–492.
Forker, Diana & Schiborr, Nils N. 2019. Multi-CAST Sanzhi Dargwa. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Friedman, Jerome H., Hastie, Trevor & Tibshirani, Robert. 2000. Additive logistic regression: A statistical view of boosting. Annals of Statistics 28(2): 337–407.
Givón, Talmy. 1983. Introduction. In Topic Continuity in Discourse [Typological Studies in Language 1], Talmy Givón (ed), 5–41. Amsterdam: John Benjamins.
Greenberg, Joseph H. 1954[1960]. A quantitative approach to the morphological typology of language. In Method and Perspective in Anthropology, Robert F. Spencer (ed), 192–220. Chicago IL: The University of Chicago Press.
Hadjidas, Harris & Vollmer, Maria. 2015. Multi-CAST Cypriot Greek. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Haig, Geoffrey & Schnell, Stefan. 2014. Annotations using GRAID (Grammatical Relations and Animacy in Discourse): Introduction and Guidelines for Annotators. Version 7.0. Bamberg: University of Bamberg. [URL] (23 July 2022).
Haig, Geoffrey & Schnell, Stefan. 2016. The discourse basis of ergativity revisited. Language 92(3): 591–618.
Haig, Geoffrey & Schnell, Stefan. 2021. Multi-CAST: Multilingual Corpus of Annotated Spoken Texts. Bamberg: University of Bamberg. [URL] (23 July 2022).
Haig, Geoffrey, Schnell, Stefan & Schiborr, Nils N. 2021. Universals of reference in discourse and grammar: Evidence from the Multi-CAST collection of spoken corpora. In Doing Corpus-based Typology with Spoken Language Corpora: State of the Art, Geoffrey Haig, Stefan Schnell & Frank Seifart (eds), 141–177. Honolulu HI: University of Hawai’i Press.
Haig, Geoffrey, Vollmer, Maria & Thiele, Hanna. 2019. Multi-CAST Northern Kurdish. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Haig, Geoffrey, Schnell, Stefan & Wegener, Claudia. 2011. Comparing corpora from endangered languages: Explorations in language typology based on original texts. In Documenting Endangered Languages: Achievements and Perspectives, Geoffrey Haig, Nicole Nau, Stefan Schnell & Claudia Wegener (eds), 55–86. Berlin: Mouton de Gruyter.
Haspelmath, Martin. 2006. Review of Preferred argument structure: Grammar as architecture for function, by John Du Bois, Lorraine Kumpf, and William Ashby. Language 82(4): 908–912.
Haspelmath, Martin. 2021. Explaining grammatical coding asymmetries: Form-frequency correspondences and predictability. Journal of Linguistics 57(3): 605–633.
Hawkins, John A. 2004. Efficiency and Complexity in Grammars. Oxford: OUP.
Hawkins, John A. 2014. Cross-linguistic Variation and Efficiency. Oxford: OUP.
Hildebrandt, Kristine A., Jany, Carmen & Silva, Wilson (eds). 2017. Documenting Variation in Endangered Languages [Language Documentation & Conservation special publication 13]. Honolulu HI: University of Hawai’i Press.
Himmelmann, Nikolaus P. 1998. Documentary and descriptive linguistics. Linguistics 36(2): 161–195.
Hopper, Paul J. & Thompson, Sandra A. 1980. Transitivity in grammar and discourse. Language 56(2): 251–299.
Karttunen, Lauri. 1976. Discourse referents. In Syntax and semantics, 7: Notes from the Linguistic Underground, James D. McCawley (ed), 363–385. New York NY: Academic Press.
Kimoto, Yukinori. 2019. Multi-CAST Arta. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Kurabe, Keita. 2021. Multi-CAST Jinghpaw. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Lambrecht, Knud. 1994. Information Structure and Sentence Form: Topic, Focus, and the Mental Representation of Discourse Referents. Cambridge: CUP.
Levelt, Willem J. M. 1989. Speaking: From Intention to Articulation. Cambridge MA: The MIT Press.
Levshina, Natalia. 2019. Token-based typology and word order entropy: A study based on universal dependencies. Linguistic Typology 23(3): 533–572.
Levshina, Natalia. 2021. Corpus-based typology: Applications, challenges, and some solutions. Linguistic Typology 26(1): 129–160.
Levshina, Natalia & Moran, Steve (eds). 2021. Efficiency in human languages: Corpus evidence for universal principles. Linguistics Vanguard 7(s3): 20200081.
Marslen-Wilson, William, Levy, Elena & Taylor, Lorraine K. 1982. Producing interpretable discourse: The establishment and maintenance of reference. In Speech, Place, and Action: Studies in Deixis and Related Topics, Robert J. Jarvella & Wolfgang Klein (eds), 339–378. Chichester: John Wiley & Sons.
Mayer, Mercer. 1969. Frog, Where Are You? New York NY: Dial Books for Young Readers.
Mayer, Thomas & Cysouw, Michael. 2014. Creating a massively parallel Bible corpus. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 26–31 May 2014, Nicoletta Calzolari (ed), 3148–3163. Reykjavik: European Language Resources Association (ELRA).
McEnery, Tony, Tanaka, Izumi & Botley, Simon. 2000. Corpus annotating and reference resolution. In ANARESOLUTION ’97: Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts, Ruslan Mitkov & Branimir Boguraev (ed), 57–74. Stroudsburg PA: Association for Computational Linguistics.
Meng, Chenxi. 2019. Multi-CAST Tulil. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Mosel, Ulrike & Hovdhaugen, Even. 1992. The Samoan Reference Grammar. Oslo: Scandinavian Press.
Mosel, Ulrike & Schnell, Stefan. 2015. Multi-CAST Teop. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Newman, John, Baayen, Harald & Rice, Sally (eds). 2011. Corpus-based Studies in Language Use, Language Learning, and Language Documentation. Amsterdam: Rodopi Press.
Ozerov, Pavel. 2021a. Multifactorial information management (MIM): Summing up the emerging alternative to information structure. Linguistic Vanguard 7(1): 20200039.
Ozerov, Pavel. 2021b. This research topic of yours – Is it a research topic at all? Using comparative interactional data for a fine-grained reanalysis of traditional concepts. In Doing corpus-based Typology with Spoken Language Corpora: State of the Art, Geoffrey Haig, Stefan Schnell & Frank Seifart (eds), 233–280. Honolulu HI: University of Hawai’i Press.
Prince, Ellen F. 1981. Toward a taxonomy of given-new information. In Radical Pragmatics, Peter Cole (ed), 223–255. New York NY: Academic Press.
Prince, Ellen F. 1998. On the limits of syntax, with reference to left-dislocation and topicalization. In Syntax and Semantics, Vol. 29: The Limits of Syntax, Peter W. Culicover & Louise McNally (eds), 281–302. San Diego CA: Academic Press.
Riester, Arndt & Baumann, Stefan. 2017. The RefLex scheme – Annotation guidelines. SinSpeC: Working Papers of the SFB 732 14. [URL] (23 July 2022).
San Roque, Lila, Rumsey, Alan, Gawne, Lauren, Spronck, Stef, Hoenigman, Darja, Carroll, Alice, Miller, Julia & Evans, Nicholas. 2012. Getting the story straight: Language fieldwork using a narrative problem-solving task. Language Documentation and Conservation 6: 135–174.
Schiborr, Nils N. 2015. Multi-CAST English. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Schiborr, Nils N. 2021a. Lexical Anaphora: A Corpus-based Typological Study of Referential Choice. PhD dissertation, University of Bamberg.
Schiborr, Nils N. 2021b. multicastR: A companion to the Multi-CAST collection. R package version 2.0.0. [URL] (23 July 2022).
Schiborr, Nils N., Schnell, Stefan & Thiele, Hanna. 2018. RefIND – Referent Indexing in Natural-language Discourse: Annotation Guidelines. Version 1.1. Bamberg: University of Bamberg. [URL] (23 July 2022).
Schnell, Stefan. 2015. Multi-CAST Vera’a. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Schnell, Stefan & Schiborr, Nils N. 2022. Cross-linguistic corpus studies in linguistic typology. Annual Review of Linguistics 8: 171–191.
Schnell, Stefan, Schiborr, Nils N. & Haig, Geoffrey. 2021. Efficiency in discourse processing: Does morphosyntax adapt to accommodate new referents? In Efficiency in Human Languages: Corpus Evidence for Universal Principles, Natalia Levshina & Steve Moran (eds). Linguistics Vanguard 7(s3): 20190064.
Seifart, Frank. 2021. Combining documentary linguistics and corpus phonetics to advance corpus-based typology. In Doing Corpus-based Typology with Spoken Language Corpora: State of the Art, Geoffrey Haig, Stefan Schnell & Frank Seifart (eds), 115–139. Honolulu HI: University of Hawai’i Press.
Seifart, Frank, Paschen, Ludger & Stave, Matthew (eds). 2022. Language Documentation Reference Corpus (DoReCo), Version 1.0. Leibniz-Zentrum Allgemeine Sprachwissenschaft & Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). [URL] (23 July 2022).
Stolz, Thomas. 2007. Harry Potter meets Le petit prince – On the usefulness of parallel corpora in crosslinguistic investigations. STUF – Language Typology and Universals 60(2): 100–117.
Thieberger, Nicholas & Brickell, Timothy. 2018. Multi-CAST Nafsan. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Torres Cacoullos, Rena & Travis, Catherine E. 2019. Variationist typology: Shared probabilistic constraints across (non-)null subject languages. Linguistics 57(3): 653–692.
Visser, Eline. 2021. Multi-CAST Kalamang. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Vollmer, Maria. 2020. Multi-CAST Mandarin. In Multi-CAST: Multilingual Corpus of Spoken Annotated Texts, Geoffrey Haig & Stefan Schnell (eds). Bamberg: University of Bamberg. [URL] (23 July 2022).
Wälchli, Bernhard & Cysouw, Michael. 2012. Lexical typology through similarity semantics: Towards a semantic map of motion verbs. Linguistics 50(3): 671–710.
Wald, Benji. 1983. Referents and topics within and across discourse units. Observations from current vernacular English. In Discourse Perspectives on Syntax, Flora Klein-Andreu (ed), 91–116. New York NY: Academic Press.
Zeman, Daniel, Nivre, Joakim, Abrams, Mitchell et al. 2021. Universal Dependencies 2.8. Prague: Universal Dependencies Consortium. [URL] (25 July 2022).
Zipf, George K. 1935. The Psycho-biology of Language: An Introduction to Dynamic Philology. Cambridge MA: The MIT Press.
Cited by (1)
Cited by one other publication
Ozerov, Pavel
2024.
Left Dislocation in Spoken Hebrew, it is neither topicalizing nor a construction.
Linguistics
This list is based on CrossRef data as of 5 november 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.