Article In:
Chinese as a Second Language (漢語教學研究—美國中文教師學會學報): Online-First ArticlesAssessing the accuracy of Chinese speech-to-text tools for Chinese as foreign language learners
This article examines the effectiveness of four Chinese Speech-to-Text tools in transcribing the speech of Chinese
as a Foreign Language (CFL) learners across different ACTFL proficiency levels. The results indicate notable differences in
transcription accuracy. Among the CSTT tools, ChatGPT 3.5 proves to be the most accurate, followed by WeChat and Baidu IME, while
iOS IME shows the lowest performance. Except for iOS IME, these tools achieve 100% accuracy at the Distinguished and Superior
levels, where speech closely approximates native fluency. ChatGPT 3.5 excels from Novice to Distinguished levels but occasionally
overcorrects Novice-level CFL learners’ erroneous speech. WeChat performs robustly above the Novice level, while Baidu IME is best
at the Advanced level and above. Conversely, iOS IME displays significant limitations at all levels. This study offers new
perspectives on “good pronunciation” and the debate over handwriting versus typing Chinese characters for CFL learners.
Article outline
- Introduction
- Literature review
- Four CSTT tools
- Research questions
- Methodology
- Data collection
- Data analysis procedure
- Quantifying and qualifying the accuracy of CSTT tools
- Findings
- Superior and distinguished level
- Advanced level
- Intermediate level
- Novice level
- Discussion
- Performance variation of CSTT tools across proficiency levels
- ChatGPT’s transcription accuracy and correction capabilities
- WeChat and Baidu IME’s efficacy and educational implications
- Pedagogical implications and further studies
- Integrating AI-Assisted CSTT in Chinese language education
- Reevaluating the emphasis on pronunciation accuracy and the debate over handwriting Chinese characters through CSTT
- Further study
- Conclusion
-
References
This content is being prepared for publication; it may be subject to changes.
References (29)
American Council on the Teaching of Foreign Languages. (2012). ACTFL distinguished Chinese speaking sample. The Speaking Sample. Retrieved from [URL]
. (2012). ACTFL superior Chinese speaking sample. The First Speaking Sample. Retrieved from [URL]
. (2012). ACTFL advanced Chinese speaking sample. The Second Speaking Sample. Retrieved from [URL]
. (2012). ACTFL intermediate Chinese speaking sample. The First Speaking Sample. Retrieved from [URL]
. (2012). ACTFL novice Chinese speaking sample. The First Speaking Sample. Retrieved from [URL]
. (2012). Chinese (simplified characters) speaking. Retrieved from [URL]
An, M., Yu, Z., Guo, J., Gao, S., & Xian, Y. (2014, May). The teaching experiment of speech recognition based on HMM. In The 26th Chinese Control and Decision Conference (2014 CCDC) (pp. 2416–2420). IEEE. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Coniam, D. (1998). Voice recognition software accuracy with second language speakers of English. System, 26(4), 533–544. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Deepgram. (2022). Benchmarking OpenAI’s Whisper model across languages. Retrieved from [URL]
Evers, K., & Chen, S. (2020). Effects of automatic speech recognition software on pronunciation for adults with different learning styles. Journal of Educational Computing Research,
59
(4), 669–685. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Golas, K. C. (1995). Computer-based English language training for the Royal Saudi Naval Forces. Journal of Interactive Instruction Development,
7
(4), 3–9.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hirai, A., & Kovalyova, A. (2024). Speech-to-text applications’ accuracy in English language learners’ speech transcription. Language Learning & Technology,
28
(1), 1–21. [URL]
Hwang, W. Y., Shadiev, R., Kuo, T. C. T., & Chen, N. S. (2012). Effects of speech-to-text recognition application on learning performance in synchronous cyber classrooms. Journal of Educational Technology & Society,
15
(1), 367–380.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kaur, J., Singh, A. & Kadyan, V. (2021). Automatic speech recognition system for tonal languages: State-of-the-art survey. Archives of Computational Methods in Engineering,
28
1, 1039–1068. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kincaid, J. (2018, September 5). Which automatic transcription service is the most accurate? Descript Blog. [URL]
Kuo, T. C. T., Shadiev, R., Hwang, W. Y., & Chen, N. S. (2012). Effects of applying STR for group learning activities on learning performance in a synchronous cyber classroom. Computers & Education,
58
(1), 600–608. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
McCrocklin, S. (2019). ASR-based dictation practice for second language pronunciation improvement. Journal of Second Language Pronunciation,
5
(1), 98–118. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mushangwe, H. (2015). Using voice recognition software in learning of Chinese as a foreign language pronunciation. The Journal of Language Teaching and Learning,
5
(1), 52–67.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ngo, T. T., Chen, H. H., & Lai, K. K. (2023). The overall effect size of using ASR in ESL/EFL pronunciation training. ReCALL. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ngoc, T. P., & Khai, T. T. (2021). A new approach in elementary Chinese pronunciation test using AI voice recognition at HCMUE. EDULEARN21 Proceedings, 1056–1061. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Noyes, J., & Starr, A. (1996). Use of automatic speech recognition: current and potential applications. Computing & Control Engineering Journal,
7
(5), 203–208. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Shadiev, R., & Liu, J. (2023). Review of research on applications of speech recognition technology to assist language learning. ReCALL,
35
(1), 74–88. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tejedor-García, C., Cardeñoso-Payo, V., & Escudero-Mancebo, D. (2021). Automatic speech recognition (ASR) systems applied to pronunciation assessment of L2 Spanish for Japanese speakers. Applied Sciences,
11
(15), 6695. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tejedor-García, C., Escudero-Mancebo, D., Cámara-Arenas, E., González-Ferreras, C., & Cardeñoso-Payo, V. (2020). Assessing pronunciation improvement in students of English using a controlled computer-assisted pronunciation tool. IEEE Transactions on Learning Technologies,
13
(2), 269–282. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Thomala, L. L. (2024, July 2). Number of active WeChat messenger accounts Q1 2014-Q1 2024. Statista. [URL]
Tian, Y. (2020). Error tolerance of machine translation: Findings from failed teaching design. Journal of Technology & Chinese Language Teaching,
11
(1).![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Vaughn, C., Baese-Berk, M., & Idemaru, K. (2019). Re-examining phonetic variability in native and non-native speech. Phonetica,
76
(5), 327–358. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Wu, X. (2023, February 28). Third-party input method user scale grows rapidly, Baidu Input Method leads the industry with a 46.4% market share. 第三方输入法用户规模高位增长,百度输入法以46.4%市占率领跑行业. Xianning News Network. 咸宁新闻网. [URL]