Article published in:Text Corpora and Multilingual Lexicography
[International Journal of Corpus Linguistics 6:SI] 2001
► pp. 35–42
Hybrid Approaches for Automatic Segmentation and Annotation of a Chinese Text Corpus
This paper describes the hybrid approaches for automatic segmentation and annotation of a Chinese text corpus. Some experiment results are given. Hybrid approaches combine the rule-based method, the statistic-based method, and the automatic learning method. It is a good approach, and it can obviously improve the precision of segmentation and annotation of a Chinese text corpus.
Keywords: tagging, hybrid approach, rule-based approach, HMM (Hidden Markov Model), CLAWS (Constituent-Likelihood Automatic Word-tagging System) algorithm, TBED (Transform Based Error Driven), segmentation, Brill method
Published online: 17 December 2001