Improving part-of-speech guessing of Chinese unknown words using hybrid models

Lu, Xiaofei

doi:10.1075/ijcl.13.2.03lu

Article published In:

International Journal of Corpus Linguistics
Vol. 13:2 (2008) ► pp.169–193

Improving part-of-speech guessing of Chinese unknown words using hybrid models

Xiaofei Lu | The Pennsylvania State University

This paper presents a hybrid model for part-of-speech (POS) guessing of Chinese unknown words. Most previous studies on this task have developed a unified statistical model for all Chinese unknown words and have rejected rule-based models without testing. We argue that models that use different sources of information about unknown words, both structural and contextual, can be effective for handling different types of unknown words. We propose a rule-based model that uses information about the type, length, and internal structure of unknown words and combine it with two existing statistical models that use information about the POS context and component characters of unknown words respectively for this task. By combining the complementary strengths of the three models that use different sources of information, the hybrid model achieves an accuracy of 89%, a significant improvement over the best result reported in previous studies.

Keywords: Chinese unknown words, POS tagging, rule-based models, hybrid models, corpus annotation, linguistic knowledge

Published online: 26 May 2008

https://doi.org/10.1075/ijcl.13.2.03lu

Cited by (1)

Cited by one other publication

Lu, Xiaofei & Ben Pin-Yun Wang

2017. Towards a metaphor-annotated corpus of Mandarin Chinese. Language Resources and Evaluation 51:3 ► pp. 663 ff.

This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.