Vol. 57:3 (2022) ► pp.238–269
Building a corpus of spoken Chinese interlanguage and some results of preliminary analyses
The corpus of spoken Chinese interlanguage in this study consists of over one million characters of transcribed student speech from data collected from nearly ten years of study abroad research. The main research method was a comparison with a similar corpus of spoken Chinese by native speakers. Preliminary analyses show that 11 of the top 20 most frequent words in both the learner corpus and native corpus are the same. Learners used some grammatical function words, such as 把 (bǎ), 了 (le), 它 (tā), and 着 (zhe) less than native speakers, while other ones, such as 我 (wǒ) and 的 (de), much more frequently. Possible explanations for these patterns, as well as pedagogical implications and directions for further research are discussed.
Article outline
- Introduction
- Literature review
- Corpus Linguistics in second language acquisition and teaching research
- Spoken corpora of L2 Chinese
- Developing a corpus of spoken Chinese interlanguage
- Chinese learning background of the students
- Program in China
- Data collection
- Stage I
- Stage II
- Stage III
- Building the corpus
- Analyzing data in corpus linguistics research
- Preliminary analyses
- Theoretical background: The Contrastive Interlanguage Analysis (CIA)
- Research Questions
- Method
- Native reference corpus
- Word segmentation
- Research tool
- Results
- Wordlists and some initial observations
- Keywords
- Comparisons of 把,了, and 着
- 我
- 的
- 它
- 很 vs. 挺
- 是…的
- Discussion
- 我
- 的
- 它
- 很 vs. 挺
- Pedagogical implications
- Helping students address the underuse issue
- Helping students develop pragmatic competence
- Limitations and further research
- Conclusions
- Acknowledgements
- Notes
-
References
https://doi.org/10.1075/csl.18015.du