This article reviews corpus-based Chinese studies, both applied and theoretical, from the 1920s to the present. It will be shown that, while corpus-based Chinese studies have been gaining momentum for only the last couple of decades, the roots of Chinese corpus linguistics go all the way back to the beginning of the 20th century. Today the bulk of corpus-based Chinese studies is oriented toward applied linguistics, with the compilation of frequency character/word lists and interlanguage Chinese studies being the most popular types of research. In addition to applied linguistic studies, this overview also highlights some innovative corpus studies on lexical and grammatical aspects of both classical and modern Chinese, as well as studies of sociolinguistic variation and discourse pragmatics. Overall, important groundwork in Chinese corpus linguistics is acknowledged and future directions are discussed.
1929a “Yutiwen Yingyong Zihui Yanjiu Baogao: Chen Heqin Shi Yutiwen Yingyong Zihui zhi Xu [A study of characters used in vernacular Chinese: Extending Chen’s character list].” Jiaoyu Zazhi [Journal of Education] 21 (2): 77–101.
Ao, Hongde
1929b “Yutiwen Yingyong Zihui Yanjiu Baogao (Xu): Chen Heqin Shi Yutiwen Yingyong Zihui zhi Xu [A Study of Characters Used in Vernacular Chinese: Extending Chen’s Character List (Continued)].” Jiaoyu Zazhi [Journal of Education] 21 (3): 97–113.
Bei, Guiqin,Xuetao Zhang
and 1988Hanzi Pindu Tongji [Frequency calculation of Chinese characters]. Beijing: Publishing House of Electronics Industry.
1922 “Yutiwen Yingyong Zihui [Characters used in vernacular Chinese].” Xin Jiaoyu [New Education] 5 (5): 987–995.
Chen, Heqin
1928Yutiwen Yingyong Zihui [Characters used in vernacular Chinese]. Shanghai: The Commercial Press.
Chen, Heqin
2008“Yutiwen Yingyong Zihui [Characters used in vernacular Chinese].” In Chen Heqin Quanji (Di Liu Juan) [The complete works of Heqin Chen (Volume 6)], ed. by Xiuyun Chen and Yifei Chen, 55–114. Nanjing: Jiangsu Education Press.
China State Language Commission and China State Bureau of Standards
1992Xiandai Hanyu Zipin Tongji Biao [A frequency list of modern Chinese characters]. Beijing: Language and Culture Press.
Chu, Chengzhi, and Xiaohe Chen
1993 “Jianli Hanyu Zhongjieyu Yuliaoku Xitong de Jiben Shexiang [The initial considerations of creating a Chinese interlanguage corpus system].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 7 (3): 199–205.
Cui, Xiliang
2005 “Oumei Xuesheng Hanyu Jieci Xide de Tedian ji Pianwu Fenxi [The acquisition of Chinese prepositions by European and American learners and analysis of their errors].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 19 (3): 83–95.
Cui, Xiliang, and Baolin Zhang
(eds.)2013Dier Jie Hanyu Zhongjieyu Yuliaoku Jianshe yu Yingyong Guoji Xueshu Taolunhui Lunwen Xuanji [Proceedings of the second international symposium on the construction and application of Chinese interlanguage corpora]. Beijing: Beijing Language and Culture University Press.
Eifring, Halvor
1992A Concordance to Baiyujing. Oslo: Solum Forlag.
Feng, Shengli
2002The Prosodic Syntax of Chinese. Muenchen: Lincom Europa.
2012Ziran Yuyan Chuli Jianming Jiaocheng [A concise course of natural language processing]. Shanghai: Shang Foreign Language Education Press.
Granger, Sylviane
1996 “From CA to CIA and Back: An Integrated Approach to Computerized Bilingual and Learner Corpora.” In Languages in Contrast: Text-based cross-linguistic studies, ed. by Karin Aijmer, et al., 37–51. Lund: Lund University Press.
Granger, Sylviane
(ed.)1998Learner English on Computer. London: Longman.
Granger, Sylviane
2002 “A Bird’s-eye View of Learner Corpus Research.” In Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, ed. by Sylviane Granger, et al., 3–33. Amsterdam: John Benjamins Publishing Company.
2011Shisan Jing Zipin Yanjiu [The frequency study of the thirteen Chinese canons]. Beijing: Higher Education Press.
Halliday, Michael
1959The Language of the Chinese “Secret History of the Mongols”. Oxford: Basil Blackwell.
Halliday, Michael
1992 “Language as System and Language as Instance: The Corpus as a Theoretical Construct.” In
Directions in Corpus Linguistics: Proceedings of Nobel symposium 82
, ed. by Jan Svartvik, 61–77. Berlin: Mouton de Gruyter.
Halliday, Michael
2008Complementarities in Language. Beijing: The Commercial Press.
Hung, William
1932Yinde Shuo [On indexing]. Peking: Harvard-Yenching Institute Sinological Index Series, Peking University Library.
Institute of Language Teaching Research at Beijing Language Institute
1985aHanyu Cihui de Tongji yu Fenxi [The statistics and analysis of Chinese words]. Beijing: Foreign Language Teaching and Research Press.
Institute of Language Teaching Research at Beijing Language Institute
1985bChangyong Zi he Changyong Ci [Frequently used characters and words]. Beijing: The Publishing House of Beijing Language Institute.
Institute of Language Teaching Research at Beijing Language Institute
1988Xiandai Hanyu Pinlu Cidian [Frequency dictionary of Chinese words]. Beijing: The Publishing House of Beijing Language Institute.
Lau, Din Cheuk, Ho Che Wah, and Chen Fong Ching
(eds.)1992A Concordance to Shuoyuan No. 1 (ICS Ancient Chinese Texts Concordance Series). Hong Kong: The Commercial Press.
Li, Fanglan
2011Xiandai Hanyu Yuyiyun de Lilun Tansuo yu Xide Yanjiu: Yuliaoku Yuyanxue Shijiao [A theoretical exploration into semantic prosody and its acquisition of modern Chinese: A corpus linguistics perspective]. Unpublished PhD thesis. Minzu University of China.
Li, Jinman, and Fuyun Wu
2013 “Leixingxue Gaikuo yu Eryu Xuexizhe Hanyu Guanxi Congju Chanchu Yanjiu [Typological generalisations and the study on the production of Chinese relative clauses by second language learners].” Waiyu Jiaoxue yu Yanjiu [Foreign language teaching and research] 45 (1): 80–92.
Li, Jinxi
1922 “Guoyu zhong Jiben Yuci de Tongji Yanjiu [Statistical considerations of basic ocabulary in Chinese].” Guowen Xuehui Congkan [Journal of Chinese language society] 1 (1): 81–84.
Liu, Eric Shen
1973Frequency Dictionary of Chinese Words. The Hague: Mouton.
Liu, Yuan, Nanyuan Liang, Dejin Wang, Sheying Zhang, Tieying Yang, Chunyu Jie, and Wei Sun
1990Xiandai Hanyu Changyong Ci Cipin Cidian [A dictionary of frequency of modern Chinese words]. Beijing: Astronautic Publishing House.
Liu, Yun
2009 “Hanyu Cihui Tongji Yanjiu Shuping [A review of Chinese vocabulary statistical studies].” Hanyu Xuexi [Chinese Language Learning] 30 (1): 62–69.
Liu, Zhiji
2009 “Zipin Shijiao de Gu Wenzi Sishu Fenbu Fazhan Yanjiu [Research on the distribution and development of four categories of character construction in ancient writings from the isual angle of character frequency].” Gu Hanyu Yanjiu [Research in ancient Chinese Language] 22 (4): 2–11.
1991Morphology (2nd Edition). Cambridge: Cambridge University Press.
McCarthy, John, and Alan Prince
1995 “Prosodic Morphology.” In Handbook of Phonology, ed. by John Goldsmith, 318–366. Oxford: Blackwell.
McEnery, Tony, and Andrew Hardie
2012Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press.
Pan, Shuguang
1984Guji Suoyin Gailun [Indexing of Chinese classics: A general introduction]. Beijing: Catalogs and Documentations Publishing House.
Sentence Pattern Research Group at Beijing Language Institute
1989a “Xiandai Hanyu Jiben Juxing [Basic sentence patterns of modern Chinese].” Shijie Hanyu Jiaoxue [Chinese teaching in the world] 3 (1): 26–35.
Sentence Pattern Research Group at Beijing Language Institute
1989b “Xiandai Hanyu Jiben Juxing (Xuyi) [Basic sentence patterns of modern Chinese (Continued I)].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 3 (3): 144–148.
Sentence Pattern Research Group at Beijing Language Institute
1989c “Xiandai Hanyu Jiben Juxing (Xuer) [Basic sentence patterns of modern Chinese (Continued II)].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 3 (4): 211–219.
Sentence Pattern Research Group at Beijing Language Institute
1990 “Xiandai Hanyu Jiben Juxing (Xusan) [Basic sentence patterns of modern Chinese (Continued III)].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 4 (1): 27–33.
Sentence Pattern Research Group at Beijing Language Institute
1991 “Xiandai Hanyu Jiben Juxing (Xusi) [Basic sentence patterns of modern Chinese (Continued IV)].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 5 (1): 23–29.
Siewierska, Anna, Jiajin Xu, and Richard Xiao
2010 “Bang-le Yi Ge Da Mang (Offered a Big Helping Hand): A Corpus Study of the Splittable Compounds in Spoken and Written Chinese.” Language Sciences 32 (4): 464–487.
Sinclair, John
2004Trust the Text: Language, Corpus and Discourse. London: Routledge.
1922Laojielao [The interpretation of Dao De Jing based on Dao De Jing texts]. Beijing: Self-publication. A synthetic study of LaoTzu’s TaoTeChing in Chinese
Tsou, Benjamin, and Rujie You
2007‘21 Shiji Huayu Xin Ciyu Cidian’ Bianzhu Ganyan [Reflections on compiling ‘The Dictionary of Chinese Neologisms for the 21st Century’]. Cishu Yanjiu [Lexicographical Studies] 29 (6): 123–128.
Tsou, Benjamin, and Rujie You
2010Quanqiu Huayu Xin Ciyu Cidian [An international dictionary of Chinese neologisms]. Beijing: The Commercial Press.
Tsou, Benjamin, Hing-Lung Lin, Terence Chan, Jerome Hu, Ching-hai Chew, and John K.P. Tse
1997 “A Synchronous Chinese Language Corpus from Different Speech Communities: Construction and Application.” International Journal of Computational Lingusitics and Chinese Language Processing 2 (1): 91–104.
Unihan Digital Technology Co., Ltd
2008Guji Hanzi Zipin Tongji [Character frequency calculation of classical Chinese]. Beijing: The Commercial Press.
Wang, Chunxia
2001Jiyu Yuliaoku de Lihe Ci Yanjiu [A corpus-based study of splittable sompounds]. M.A. dissertation, Beijing Language and Culture University.
Wang, Fengyang
1983Ci de Pinlu he Zi de Fenhua [Word frequency and character differentiation]. Paper presented at the
Second Annual Conference of Chinese Linguistics Society
. Hefei, Anhui, May 1983.
Wang, Haifeng
2011Xiandai Hanyu Liheci Lixi Xingshi Gongneng Yanjiu [A functional study of the split forms of splittable compounds in Modern Chinese]. Beijing: Peking University Press.
2009A Frequency Dictionary of Mandarin Chinese: Core Vocabulary for Learners. London: Routledge.
Xiao, Xiqiang, and Wangxi Zhang
(eds.)2011Shoujie Hanyu Zhongjieyu Yuliaoku Jianshe yu Yingyong Guoji Xueshu Taolunhui Lunwen Xuanji [Proceedings of the first international symposium on the construction and application of Chinese interlanguage corpora]. Beijing: World Publishing Corporation.
Xiong, Wenxin
1996 “Liuxuesheng Ba Zi Jiegou de Biaoxian Fenxi [An Analysis of the Performance of Ba Constructions by International Students].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 10 (1): 80–87.
Xu, Jiajin
2009Qingshaonian Hanyu Kouyu zhong Huayu Biaoji de Huayu Gongneng Yanjiu [The use of discourse markers in spoken Chinese of urban teenagers]. Beijing: Foreign Language Teaching and Research Press.
Yang, Bojun
1980Lunyu Yizhu [Annotations to the Analects]. Beijing: Zhonghua Book Company.
Yang, Shiqiao
2011Jiyu Yuliaoku de Hanyu Yihuan Huihua Xiuzheng Yanjiu [A corpus based study of repair in Chinese doctor–patient conversations]. Unpublished PhD thesis. Shanghai: Shanghai International Studies University.
Zhang, Pu
1999a “Guanyu Daguimo Zhenshi Wenben Yuliaoku de Jidian Lilun Sikao [Some theoretical thoughts about the large-scale corpora of authentic texts].” Yuyan Wenzi Yingyong [Applied Linguistics] 8, 1, 34–43.
Zhang, Pu
1999b “Guanyu Yugan yu Liutongdu de Sikao [On Language sense and degree of circulation].” Yuyan Jiaoxue yu Yanjiu [Language Teaching and Linguistic Studies] 21 (2): 83–96.
Zhou, Shengya
2007Soushenji Yuyan Yanjiu [A linguistic study of Soushenji]. Beijing: China Renmin University Press.
Zipf, George
1935The Psycho-Biology of Language: An Introduction to Dynamic Philology. Boston: Houghton Mifflin Company.
Zou, Shaohua, and Biao Ma
2007Qiyi de Qingxiangxing Yanjiu [Studies of preferred interpretations of morpho-syntactic ambiguities]. Beijing: China Social Sciences Press.
Zou, Shaohua
2001Yuyong Pinlu Xiaoying Yanjiu [Studies in frequency effects of language use]. Beijing: The Commercial Press.
Cited by
Cited by 8 other publications
Chen, Howard Ho-Jan & Hongyin Tao
2019. Academic Chinese: From Corpora to Language Teaching. In Computational and Corpus Approaches to Chinese Language Learning [Chinese Language Learning Sciences, ], ► pp. 57 ff.
2020. The processing of multiword expressions in children and adults: An eye-tracking study of Chinese. Applied Psycholinguistics 41:4 ► pp. 901 ff.
Man Kit Lee, Stephen, Hey Wing Liu & Shelley Xiuli Tong
2023. Identifying Chinese Children with Dyslexia Using Machine Learning with Character Dictation. Scientific Studies of Reading 27:1 ► pp. 82 ff.
Wu, Shuqiong
2021. A corpus-based study of the Chinese synonymous approximativesshangxia, qianhouandzuoyou. Corpus Linguistics and Linguistic Theory 17:2 ► pp. 411 ff.
Xu, Jiajin
2019. The Corpus Approach to the Teaching and Learning of Chinese as an L1 and an L2 in Retrospect. In Computational and Corpus Approaches to Chinese Language Learning [Chinese Language Learning Sciences, ], ► pp. 33 ff.
Zhang, Huiyu & Yayu Shi
2023. Evolution of English language education policies in the Chinese mainland in the 21st century: A corpus-based analysis of official language policy documents. Linguistics and Education 76 ► pp. 101190 ff.
Zhang, Huiyu, Yayu Shi & Haitao Liu
2023. Evolving means of formal language policy on Putonghua and minority languages on the Chinese mainland (1986–2021). International Journal of Multilingualism► pp. 1 ff.
This list is based on CrossRef data as of 27 september 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.