Vol. 19:1 (2018) ► pp.61–79
Filtered collocations as features in verbal polysemy disambiguation
A case study of the Chinese verb kao ‘bake’
In Generative Lexicon Theory (glt) (Pustejovsky 1995), co-composition is one of the generative devices proposed to explain the cases of verbal polysemous behavior where more than one function application is allowed. The English baking verbs were used as examples to illustrate how their arguments co-specify the verb with qualia unification. Some studies (Blutner 2002; Carston 2002; Falkum 2007) stated that the information of pragmatics and world knowledge need to be considered as well. Therefore, this study would like to examine whether glt could be practiced in a real-world Natural Language Processing (nlp) application using collocations. We have conducted a fine-grained logical polysemy disambiguation task, taking the open-sourced Leiden Weibo Corpus as resource and computing with Support Vector Machine (svm) classifier. Within the classifier, we have taken collocated verbs under glt as main features. In addition, measure words and syntactic patterns are extracted as additional features for comparison. Our study investigates the logical polysemy of the Chinese verb kao ‘bake’. We find that glt could help in identifying logically polysemous cases; additional features would help the classifier achieve a higher performance.
Article outline
- 1.Introduction
- 2.Co-composition and qualia structure in glt
- 3.Methodology
- 3.1Data collection
- 3.2Filtering nouns as seeds
- 3.3Selection of collocated features: verbs, measure words and syntactic constructions
- 3.3.1Collocated verbs and measure words
- 3.3.2Collocated syntactic patterns
- 3.4Constructing a data frame with features for svm classification
- 3.5 svm classification
- 4.Analysis and discussion
- 4.1The collocated verbs with measure words
- 4.1.1Features for nouns with change of state senses
- 4.1.2Features for nouns with creation senses
- 4.2.3Others
- 4.2Pattern 3+4
- 4.1The collocated verbs with measure words
- 5.Conclusion
- Acknowledgements
- Notes
- Abbreviations
-
References
For any use beyond this license, please contact the publisher at [email protected].
https://doi.org/10.1075/lali.00003.cha