Investigating the additive probability of repeated language production decisions

Wallis, Sean

doi:10.1075/ijcl.17093.wal

Article published In:

International Journal of Corpus Linguistics
Vol. 24:4 (2019) ► pp.490–521

Investigating the additive probability of repeated language production decisions

Sean Wallis | Survey of English Usage, UCL

This paper introduces an experimental paradigm based on probabilistic evidence of the interaction between construction decisions in a parsed corpus. The approach is demonstrated using ICE-GB, a one million-word corpus of English. It finds an interaction between attributive adjective phrases in noun phrases with a noun head, such that the probability of adding adjective phrases falls successively. The same pattern is much weaker in adverbs preceding a verb phrase, implying this decline is not a universal phenomenon. Noun phrase postmodifying clauses exhibit a similar initial fall in the probability of successive clauses modifying the same NP head, and embedding clauses modifying new NP heads. Successive postmodification shows a secondary phenomenon of an increase in additive probability in longer sequences, apparently due to ‘templating’ effects. The author argues that these results can only be explained as cognitive and communicative natural phenomena acting on and within recursive grammar rules.

Keywords: additive probability, interaction evidence, language production, parsing, grammar evaluation

Article outline

1.Introduction
2.Syntactic annotation in corpus linguistics
- 2.1Criteria for selecting and evaluating frameworks
- 2.2Retrievability of linguistic events
- 2.3Retrievability of patterns of interaction
3.Three experiments with attributive adjectives in noun phrases
- 3.1Experiment 1: Attributive adjective phrases
- 3.2Experiment 2: Attributive adjective phrases with proper and common noun heads
- 3.3Experiment 3: Attributive adjectives, without parsing
4.Experiment 4: Grammatical interaction between preverbal adverb phrases
5.Experiment 5: Grammatical interaction in postmodifying clauses
- 5.1Sequential postmodification
- 5.2Embedded postmodification
- 5.3Alternative explanations
- 5.4Discussion
6.Conclusions
- 6.1Implications for corpus linguistics
- 6.2Towards the evaluation of grammar
Acknowledgements
Notes
References

Published online: 1 November 2019

https://doi.org/10.1075/ijcl.17093.wal

References (38)

References

Aarts, B. (2001). Corpus linguistics, Chomsky and Fuzzy Tree Fragments. In C. Mair & M. Hundt (Eds.), Corpus Linguistics and Linguistic Theory (pp. 5–13). Amsterdam: Rodopi.

Abeillé, A. (Ed.) (2003). Treebanks: Building and Using Parsed Corpora. Dordrecht: Kluwer.

Anderson, J. R. (1983). The Architecture of Cognition. Cambridge, MA: Harvard University Press.

Beaman, K. (1984). Coordination and subordination revisited: Syntactic complexity in spoken and written narrative discourse. In D. Tannen (Ed.), Spoken and Written Language: Exploring Orality and Literacy (pp. 45–80). Norwood, NJ: Ablex.

Böhmová, A., Hajič, J., Hajičová, E., & Hladká, B. (2003). The Prague Dependency Treebank: A three-level annotation scenario. In A. Abeillé (Ed.), Treebanks: Building and Using Parsed Corpora (pp. 103–127). Dordrecht: Kluwer.

Carroll, J., Minnen, G., & Briscoe, T. (2003). Parser evaluation: Using a grammatical relation annotation scheme. In A. Abeillé (Ed.), Treebanks: Building and Using Parsed Corpora (pp. 299–316). Dordrecht: Kluwer.

Davies, M. (2004–). British National Corpus (from Oxford University Press). Available online at [URL] (last accessed August 2019).

Fang, A. (1996). The Survey Parser, design and development. In S. Greenbaum (Ed.), Comparing English Worldwide (pp. 142–160). Oxford: Clarendon.

Feist, J. (2011). Premodifiers in English: Their Structure and Significance. Cambridge: Cambridge University Press.

Garside, R., Leech, G. & Sampson, G. (Eds). (1987). The Computational Analysis of English: A Corpus-based Approach. London: Longman.

Garside, R., & Leech, G. (1991). Running a grammar factory: The production of syntactically analysed corpora or ‘treebanks’. In S. Johansson & A.-B. Stenström (Eds.), English Computer Corpora: Selected Papers and Research Guide (pp. 15–32). Berlin: Mouton de Gruyter.

Greenbaum, S., & Ni, Y. (1996). About the ICE Tagset. In S. Greenbaum (Ed.), Comparing English Worldwide (pp. 92–109). Oxford: Clarendon.

Huddleston, R., & Pullum, G. K. (Eds.) (2002). The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press.

Karlsson, F., Voutilainen, A., Heikkilä, J., & Antilla, A. (Eds.) (1995). Constraint Grammar: A Language-independent System for Parsing Unrestricted Text. Berlin: Mouton de Gruyter.

Leech, G. (1992). 100 million words of English: The British National Corpus. Language Research, 28(1), 1–13.

Marcus, M., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.

Marcus, M., Kim, G., Marcinkiewicz, M. A., MacIntyre, R., Bies, A., Ferguson, M., Katz, J., & Schasberger, B. (1994). The Penn Treebank: Annotating predicate argument structure. In Proceedings of the Workshop on Human Language Technology (pp. 114–119). San Francisco, CA: Morgan Kaufmann.

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97.

Moreno, A., López, S., Sánchez, F., & Grishman, R. (2003). Developing a Spanish Treebank. In A. Abeillé (Ed.), Treebanks: Building and Using Parsed Corpora (pp. 149–163). Dordrecht: Kluwer.

Nelson, G., Wallis, S. A., & Aarts, B. (2002). Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam/Philadelphia, PA: John Benjamins.

Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion: Comparison of seven methods. Statistics in Medicine, 17(8), 857–872.

Oflazer, K., Say, B., Hakkani-Tür, D. Z., & Tür, G. (2003). Building a Turkish Treebank. In A. Abeillé (Ed.), Treebanks: Building and Using Parsed Corpora (pp. 261–277). Dordrecht: Kluwer.

Pickering, M. & Ferreira, V. S. (2008). Structural Priming: A critical review. Psychological Bulletin, 134(3), 427–459.

Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A Comprehensive Grammar of the English Language. London: Longman.

Sheskin, D. J. (2011). Handbook of Parametric and Nonparametric Statistical Procedures (5th ed.). Boca Raton, FL: CRC Press.

Sinclair, J. (1987). Grammar in the dictionary. In J. Sinclair (Ed.), Looking Up: An Account of the COBUILD Project in Lexical Computing (pp. 104–115). London: Collins.

Tannen, D. (1987). Repetition in conversation: Toward a poetics of talk. Language, 63(3), 574–605.

Wallis, S. A., & Nelson, G. (1997). Syntactic parsing as a knowledge acquisition problem. In E. Plaza & R. Benjamins (Eds.), Knowledge Acquisition, Modeling and Management. EKAW 1997. (pp. 285–300). Berlin: Springer.

Wallis, S. A. & Nelson, G. (2000). Exploiting fuzzy tree fragments in the investigation of parsed corpora. Literary and Linguistic Computing, 15(3), 339–361.

Wallis, S. A. (2003). Completing parsed corpora: From correction to evolution. In A. Abeillé (Ed.), Treebanks: Building and Using Parsed Corpora (pp. 61–71). Dordrecht: Kluwer.

(2008). Searching treebanks and other structured corpora. In A. Lüdeling & M. Kytö (Ed.), Corpus Linguistics: An International Handbook (pp. 738–759). Berlin: Mouton de Gruyter.

(2013). Binomial confidence intervals and contingency tests: Mathematical fundamentals and the evaluation of alternative methods. Journal of Quantitative Linguistics, 20(3), 178–208.

(2014). What might a corpus of parsed spoken data tell us about language? In L. Veselovská & M. Janebová (Eds.), Complex Visibles out there. Proceedings of the Olomouc Linguistics Colloquium 2014: Language Use and Linguistic Structure (pp. 641–662). Olomouc: Palacký University.

(2019). Comparing χ² tables for separability of distribution and effect: Meta-tests for comparing homogeneity and goodness of fit test outcomes. Journal of Quantitative Linguistics, 26(4), 330–355.

(forthcoming). Grammar and corpus methodology. In B. Aarts, G. Popova & J. Bowie (Eds.), The Oxford Handbook of English Grammar (pp. 59–83). Oxford: Oxford University Press.

Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22, no. 158, 209–212.

van Zaanen, M., Roberts, A., & Atwell, E. (2004). A multilingual parallel parsed corpus as gold standard for grammatical inference evaluation. In L. Kranias, N. Calzolari, G. Thurmair, Y. Wilks, E. Hovy, G. Magnusdottir, A. Samtiou & K. Choukri (Eds.), Proceedings of LREC’04 Workshop on The Amazing Utility of Parallel and Comparable Corpora (pp. 58–61). Lisbon: ELRA.

Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort. Cambridge, MA: Addison Wesley.