Building a corpus of spoken Chinese interlanguage and some results of preliminary analyses

Du, Hang

doi:10.1075/csl.18015.du

Article published In:

Chinese as a Second Language (漢語教學研究—美國中文教師學會學報)
Vol. 57:3 (2022) ► pp.238–269

Building a corpus of spoken Chinese interlanguage and some results of preliminary analyses

Hang Du | Middlebury College

The corpus of spoken Chinese interlanguage in this study consists of over one million characters of transcribed student speech from data collected from nearly ten years of study abroad research. The main research method was a comparison with a similar corpus of spoken Chinese by native speakers. Preliminary analyses show that 11 of the top 20 most frequent words in both the learner corpus and native corpus are the same. Learners used some grammatical function words, such as 把 (bǎ), 了 (le), 它 (tā), and 着 (zhe) less than native speakers, while other ones, such as 我 (wǒ) and 的 (de), much more frequently. Possible explanations for these patterns, as well as pedagogical implications and directions for further research are discussed.

Keywords: corpus of spoken Chinese interlanguage, study abroad, grammatical function words, pragmatic competence

Article outline

Introduction
Literature review
- Corpus Linguistics in second language acquisition and teaching research
- Spoken corpora of L2 Chinese
Developing a corpus of spoken Chinese interlanguage
- Chinese learning background of the students
- Program in China
- Data collection
  - Stage I
  - Stage II
  - Stage III
- Building the corpus
- Analyzing data in corpus linguistics research
Preliminary analyses
- Theoretical background: The Contrastive Interlanguage Analysis (CIA)
- Research Questions
- Method
  - Native reference corpus
  - Word segmentation
  - Research tool
- Results
  - Wordlists and some initial observations
  - Keywords
- Comparisons of 把,了, and 着
  - 我
  - 的
  - 它
  - 很 vs. 挺
  - 是…的
Discussion
- 我
- 的
- 它
- 很 vs. 挺
Pedagogical implications
- Helping students address the underuse issue
- Helping students develop pragmatic competence
Limitations and further research
Conclusions
Acknowledgements
Notes
References

Published online: 27 February 2023

https://doi.org/10.1075/csl.18015.du

References (62)

Adolphs, S., & Knight, D.

(2012) Building a spoken corpus: What are the basics? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (38–52). New York: Routledge.

Aijmer, K.

(2002) Modality in advanced Swedish learners’ written interlanguage. In Granger, S., Hung, J. & Petch-Tyson, S. (Eds.), Computer learner corpora, second language acquisition, and foreign language teaching (55–76). Amsterdam and Philadelphia: John Benjamins.

Anthony, L.

(2014) AntConc (Version 3.4.3) [Computer Software]. Tokyo, Japan: Waseda University. Available from [URL]

Ayoun, D.

(1996) The subset principle in second language acquisition. Applied Psycholinguistics, 171, 185–213.

Biq, Y-O.

(1990) The Chinese third-person pronoun in spoken discourse. Papers from the 26th regional meeting of the Chicago Linguistic Society, 11, 61–72.

Bourgerie, D. S.

(1996) Acquisition of modal particles in Chinese second language learners. In S. McGinnis (Ed.), Chinese pedagogy: An emerging field. Columbus, OH: The Ohio State University Foreign Language Publications.

Breyer, Y.

(2011) Corpora in language teaching and learning: Potential, evaluation, challenges. New York: Peter Lang.

Brezina, V., Gablasova, D., & McEnery, T.

(2019) Corpus-based approaches to spoken L2 production: Evidence from the Trinity Lancaster Corpus. International journal of Learner Corpus Research, 5 (2), 119–125.

Chao, Y-R.

(1968) A grammar of spoken Chinese. Berkeley and Los Angeles: University of California Press.

Chomsky, N.

(1986) Knowledge of language: Its nature, origin, and use. New York: Praeger Publishers.

Coulmas, F.

(1989) The writing systems of the world. Oxford: Blackwell Publishers.

Diao, W.

(2016) Peer socialization into gendered L2 Mandarin practices in a study abroad context: Talk in the dorm. Applied Linguistics, 37 (5), 599–620.

Du, H.

(2013) The development of Chinese fluency during study abroad in China. The Modern Language Journal, 971, 131–143.

(2015) American college students studying abroad in China: Language, identity, and self-presentation. Foreign Language Annals, 48 (2), 250–266.

(2016) A corpus linguistics approach to the research and teaching of Chinese as a second language: The case of the ba-construction. In H. Tao (Ed.), Integrating Chinese linguistic research and language teaching and learning (13–31). Amsterdam: John Benjamins.

Evison, J.

(2012) What are the basics of analyzing a corpus? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (122–135). New York: Routledge.

Gablasova, D., Brezina, V., & McEnery, T.

(2019) The Trinity Lancaster Corpus: Development, description and application. International Journal of Learner Corpus Research, 5 (2), 126–158.

Gilquin, G.

(2019) Light verb constructions in spoken L2 English: An exploratory cross-sectional study. International Journal of Learner Corpus Research, 5 (2), 181–206.

González-Lloret, M.

(2019) Technology and L2 pragmatics learning. Annual Review of Applied Linguistics, 39 1, 113–127.

Granger, S.

(Ed.) (1998) Learner English on computer. London and New York: Longman.

(2015) Contrastive interlanguage analysis: A reappraisal. International Journal of Learner Corpus Research, 1 (1), 7–24.

Granger, S., Gilquin, G., & Meunier, F.

(Eds.) (2015) The Cambridge handbook of learner corpus research. Cambridge: Cambridge University Press.

Granger, S.; Hung, J., & Petch-Tyson, S.

(2002) Computer learner corpora, second language acquisition, and foreign language teaching. Amsterdam and Philadelphia: John Benjamins.

Huang, C., & Xue, N.

(2015) Modeling word concepts without conversation: Linguistic and computational issues in Chinese word identification. In, W. S-Y. Wang & C. Sun (Eds.), The Oxford handbook of Chinese linguistics (348–361). Oxford: Oxford University Press.

Hunston, S.

(2002) Corpora in applied linguistics. Cambridge: Cambridge University Press.

Institute of Language Education, Beijing Language and Culture University

(1986) Xiandai Hanyu pinlü cidian [A frequency dictionary of Modern Standard Chinese]. Beijing: Beijing Language and Culture University Press.

Koester, A.

(2012) Building small specialized corpora. In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (66–79). New York: Routledge.

Leech, G.

(1998) Preface. In Granger, S. (Ed.), Learner English on computer (xiv–xx). London and New York: Longman.

Li, C., & Thompson, S.

(1981) Mandarin Chinese: A functional reference grammar. Berkeley and Los Angeles: University of California Press.

Li, W.

(2006) 把话题链纳入汉语教学语法体系–汉语语篇特点在外语教学中的体现 [Incorporating topic chains into pedagogical grammar of Chinese]. Journal of Chinese Language Teachers Association, 41 (1), 31–56.

Li, X.

(2010) Sociolinguistic variation in the speech of learners of Chinese as a second language. Language Learning, 60 (2), 366–408.

(2017) Stylistic variation in L1 and L2 Chinese: Native speakers, learners, teachers, and textbooks. Chinese as a Second Language, 52 (1), 55–76.

Liu, Y. 刘月华, Pan, W. 潘文娱, & Gu, W. 故韡

(2006) Shiyong xiandai Hanyu yufa 实用现代汉语语法 [Practical Grammar of Modern Chinese]. Beijing: Commercial Press.

Liu, Y., Yao, T., Bi, N-P., Ge, L., & Shi, Y.

(2016) Integrated Chinese 中文聽說讀寫: Traditional character textbook, Volume 11 (4th Ed.). Boston: Cheng & Tsui Company, Inc.

(2017) Integrated Chinese 中文聽說讀寫: Traditional character textbook, Volume 21 (4th Ed.). Boston: Cheng & Tsui Company, Inc.

Lorenz, G.

(1999) Adjective intensification – Learners vs. native speakers. A corpus study of argumentative writing. Amsterdam and Atlanta: Rodopi.

McEnery, T., Brezina, V., Gablasova, D., & Banerjee, J.

(2019) Corpus linguistics, learner corpora, and SLA: Employing technology to analyze language use. Annual Review of Applied Linguistics, 39 1, 74–92.

Ming, T., & Tao, H.

(2008) Developing a Chinese heritage language corpus: Issues and a preliminary report. In A. He & Y. Xiao (Eds.), Chinese as a heritage language: Fostering rooted world citizenry (167–87). Honolulu, HI: National Foreign Language Resource Center, University of Hawai’i.

Napoli, D. J.

(1993) Syntax: Theory and problems. New York and Oxford: Oxford University Press.

O’Keeffe, A., & McCarthy, M.

(Eds.) (2012) The Routledge handbook of corpus linguistics. New York: Routledge.

Paquot, M., & Plonsky, L.

(2017) Quantitative research methods and study quality in learner corpus research. International Journal of Learner Corpus Research, 3 (1), 61–94.

Polio, C.

(1995) Acquiring nothing? The use of zero pronouns by nonnative speakers of Chinese and the implications for the acquisition of nominal reference. Studies in Second Language Acquisition, 17 1, 353–377.

Ramsey, S.

(1987) The languages of China. Princeton, NJ: Princeton University Press.

Römer, U.

(2011) Corpus research applications in second language teaching. Annual Review of Applied Linguistics, 31 1, 205–225.

Schmidt, R.

(1993) Consciousness, learning and interlanguage pragmatics. In G. Kasper & S. Blum-Kulka (Eds.), Interlanguage pragmatics (43–57). New York: Oxford University Press.

(2001) Attention. In P. Robinson (Ed.), Cognition and second language instruction (3–32). Cambridge: Cambridge University Press.

Seidlhofer, B.

(2001) Closing a conceptual gap: The case for a description of English as a lingua franca. International Journal of Applied Linguistics, 11 1, 133–158.

Starr, R. L.

(2011) Variation in affective sentence-final particle use and transcription on Taiwanese Mandarin TV dramas. Paper presented at Symposium about Language and Society (SALSA) XIX. Austin, Texas.

Sun, C.

(2015) The use of De as a noun phrase marker. In, W. S.-Y. Wang & C. Sun (Eds.), The Oxford handbook of Chinese linguistics (362–392). Oxford: Oxford University Press.

Taguchi, N.

(2015) Instructed pragmatics at a glance: Where instructional studies were, are, and should be going. Language Teaching, 48 (1), 1–50.

Tao, H.

(2000) Adverbs of absolute time and assertiveness in vernacular Chinese: A corpus-based study. Journal of the Chinese Language Teachers Association, 35 (2), 53–74.

(2005) The Gap between natural speech and spoken Chinese teaching material: Toward a discourse approach to pedagogy. Journal of the Chinese Language Teachers Association, 40 (2), 1–24.

(2015a) Profiling the Mandarin spoken vocabulary based on corpora. In W. S.-Y. Wang & C. Sun (Eds.), The Oxford handbook of Chinese linguistics (336–347). Oxford: Oxford University Press.

(2015b) Teaching students to be discourse pragmatists: Practices in an L2 Chinese linguistics class. CHUN- Chinesischunterricht [Chun: Chinese Language Teaching], 301, 30–51.

Tsao, F.

(1979) A functional study of topic in Chinese: The first step towards discourse analysis. Taipei: Student Book.

Wexler, K., & Manzini, M. R.

(1987) Parameters and learnability in binding theory. In T. Roeper & E. Williams (Eds.), Parameter setting (166–179). Dordrecht: D. Reidel.

Wu, R.-J.

(2004) Stance in talk: A conversation analysis of Mandarin final particles. Amsterdam: John Benjamins.

Xiao, R., Rayson, P., & McEnery, T.

(2009) A frequency dictionary of Mandarin Chinese: Core vocabulary for learners. New York: Taylor and Francis.

Yeung, L.

(2009) Use and misuse of “besides”: A corpus study comparing native speakers’ and learners’ English, System, 37 (2), 330–342.

Zhang, B., 张宝林等

(2014) Ji yu yuliaoku de waiguoren Hanyu jushi xide yanjiu 基于语料库的外国人汉语句式习得研究 [A corpus-based study on the acquisition of Chinese sentence patterns by foreigners]. Beijing: Zhongguo Shuji Chubanshe.

Zhang, J.

(2014) A learner corpus study of L2 lexical development of Chinese resultative verb compounds. Journal of the Chinese Language Teachers Association, 49 (3), 1–24.

Zhang, J., & Tao, H.

(2018) Corpus-based research in Chinese as a second language. In C. Ke (Ed.), The Routledge handbook of Chinese second language acquisition (48–62). New York: Routledge.