Statistical sequence and parsing models for descriptive linguistics and psycholinguistics
This study shows that using computational linguistic models is beneficial for descriptive linguistics and psycholinguistics. It applies two models to various English genres and learner language: 1) surprisal and 2) a syntactic parser, allowing us to investigate the role of ambiguity and the interplay between idiom and syntax principles. We find that surprisal and ambiguity are higher for learner language, while parser scores and model fit are lower. In addition, the random application of alternations leads to more ambiguous sentences. Failures to generate optimal orderings in the sense of relevance theory, such as nonnative-like utterances by language learners exhibit, increase processing load, both for human and automatic processors. As human and automatic parsing difficulties correlate, we suggest syntactic parsers as psycholinguistic processing models.
Article outline
- 1.Introduction
-
2.Background and motivation: Language models
- 2.1A case for statistical language models in linguistics
- 2.1.1Significance tests are not enough
- 2.1.1.1Assumption of random distribution
- 2.1.1.2Assumption of independence from other factors
- 2.1.1.3Assumption of free choice
- 2.1.2The envelope of variation
- 2.1.3Binary local decisions
- 2.2Models for natural language processing
- 2.2.1N-gram models and the idiom principle
- 2.2.2Syntactic models: Distributed interdependent decisions
- 2.2.2.1Ambiguity
- 2.2.2.2The idiom and syntax principle in a tug-of-war
- 2.2.2.3Cognitive plausibility
- 2.2.2.4Model parameters
- 2.2.2.5Local and Global Models in Interaction
- 2.3L1 and L2 data
- 3.Data and methodology
- 3.1Data
- 3.2Surprisal and UID
- 3.3High levels of residuals and low model fit of parsers as indicator
- 4.Results: Two language processing models
- 4.1Surprisal at the level of word sequences
- 4.2Syntactic parser as a processing model
- 4.2.1Parser accuracy
- 4.2.2Parser model fit
- 5.Ambiguity
- 5.1Garden-path sentences
- 5.2Avoidance of ambiguity
- 5.3Forcing rare constituent order and alternative lexis
- 6.Conclusions
-
Notes
-
References