Assessing pronunciation using dictation tools
The use of Google Voice Typing to score a pronunciation placement test
Language institutions need efficient and reliable placement tests to ensure students are placed in appropriate classes.
This can be achieved by automating the scoring of pronunciation tests via the use of speech recognition, as its reliability has been shown
to be comparable to that of human raters. However, this technology can be costly as it requires development and maintenance, placing it
beyond the means of many institutions. This study investigates the feasibility of assessing English second language pronunciation in
placement tests through the use of a free automatic speech recognition tool, Google Voice Typing (GVT). We compared human-rated and
GVT-rated scores of 56 pronunciation placement tests. Our results indicate a strong correlation between scores for the final rating and for
each criterion on the rubric used by human raters. We conclude that leveraging this free speech technology could increase the test
usefulness of language placement tests.
Article outline
- 1.Introduction
- 2.Background
- 2.1Human rater biases when assessing pronunciation
- 2.2Automatic Speech Recognition (ASR)
- 2.3Current use of automated assessment of pronunciation
- 2.4ASR-based dictation tools and L2 pronunciation
- 2.5Google voice typing
- 2.6Test usefulness
- 3.The study
- 4.Method
- 4.1Overview
- 4.2Research context and participants
- 4.3Data collection materials
- 4.3.1Pronunciation samples
- 4.3.2Rubric
- 4.4Procedure
- 4.4.1Pronunciation samples
- 4.4.2Human-rated scores and analysis
- 4.4.3GVT-rated scores and analysis
- 4.5Data analysis
- 5.Results
- 6.Discussion
- 6.1Reliability
- 6.2Validity
- 6.3Practicality
- 7.Conclusion
- Notes
-
References
References (57)
References
Ashwell, T., & Elam, J. (2017). How accurately can the Google Web Speech API recognize and transcribe Japanese L2 English learners’ oral production? The JALT CALL Journal,
13
(1), 59–76.
Bachman, L., & Palmer, A. (1996). Language testing in practice. Oxford University Press.
Bernstein, J., Van Moere, A., & Cheng, J. (2010). Validating automated speaking tests. Language Testing,
27
(3), 355–377.
Browne, K., & Fulcher, G. (2016). Pronunciation and intelligibility in assessing spoken fluency. In T. Isaacs, & P. Trofimovich (Eds.), Second language pronunciation assessment: Interdisciplinary perspectives (pp. 37–53). Multilingual Matters.
Cámara-Arenas, E., Tejedor-García, C., Tomas-Vázquez, C., & Escudero-Mancebo, D. (2023). Automatic pronunciation assessment vs. automatic speech recognition: A study of conflicting conditions for L2-English. Language Learning & Technology, 27(1), 1–19. [URL]
Carey, M., Mannell, R., & Dunn, P. (2011). Does a rater’s familiarity with a candidate’s pronunciation affect the rating in oral proficiency interviews? Language Testing,
28
(2), 201–219.
Chen, L., Zechner, K., Yoon, S-Y., Evanini, K., Wang, X., Loukina, A., Tao, J., Davis, L., Lee, M., Mundkowsky, R., Lu, C., Leong, C., & Gyawali, B. (2018). Automated scoring of nonnative speech using the SpeechRaterSM v. 5.0 engine. ETS Research Report Series,
1
1, 1–31.
Cobb, T. (n.d.). Web Vocabprofile. Lextutor. [URL]
Cox, T., & Davies, R. S. (2012). Using automatic speech recognition technology with elicited oral response testing. CALICO Journal,
29
(4), 601–618.
Derwing, T., Munro, M., & Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly,
34
(3), 592–603.
Dillon, T., & Wells, D. (2021). Student perceptions of mobile automated speech recognition for pronunciation study and testing. English Teaching,
76
(4), 101–122.
Evers, K., & Chen, S. (2021). Effects of automatic speech recognition software on pronunciation for adults with different learning styles. Journal of Educational Computing Research,
59
(4), 669–685.
Feng, S., Kudina, O., Halpern, B., & Scharenborg, O. (2021). Quantifying bias in automatic speech recognition. In 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021). International Speech Communication Association.
Field, J. (2005). Intelligibility and the listener: The role of lexical stress. TESOL Quarterly,
39
(3), 399–423.
Filippidou, F., & Moussiades, L. (2020). Α benchmarking of IBM, Google and Wit automatic speech recognition systems. Artificial Intelligence Applications and Innovations,
583
1, 73–82.
Graham, C., Lonsdale, D., Kennington, C., Johnson, A., & Mcghee, J. (2008). Elicited imitation as an oral proficiency measure with ASR scoring. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the Sixth International Conference on Language Resources and Evaluation (pp. 1604–1610). European Language Resources Association. [URL]
Guskaroska, A. (2020). ASR-dictation on smartphones for vowel pronunciation practice. Journal of Contemporary Philology,
3
(2), 45–61.
Hahn, L. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, (38)2, 201–223.
Inceoglu, S., Lim, H., & Chen, W-H. (2020). ASR for EFL pronunciation practice: Segmental development and learners’ beliefs. The Journal of Asia TEFL,
17
(3), 824–840.
Inceoglu, S., Chen, W-H., & Lim, H. (2023). Assessment of L2 intelligibility: Comparing L1 listeners and automatic speech recognition. ReCALL,
35
(1), 89–104;
Isaacs, T. (2018a). Fully automated speaking assessment: Changes to proficiency testing and the role of pronunciation. In O. Kang, R. Thomson, and J. Murphy (Eds.), The Routledge handbook of contemporary English pronunciation (pp. 570–584). Routledge.
Isaacs, T. (2018b). Shifting sands in second language pronunciation teaching and assessment research and practice. Language Assessment Quarterly,
15
(3), 273–293.
John, P., Cardoso, W., & Johnson, C. (2022). Evaluating automatic speech recognition for L2 pronunciation feedback: A focus on Google Translate. In N. Zoghlami, C. Brudermann, C. Sarré, M. Grosbois, L. Bradley, & S. Thouësny (Eds.), Intelligent CALL, granular systems and learner data: Short papers from EUROCALL 2022.
John, P., Johnson, C., & Cardoso, W. (2024). Assessing Google Translate ASR for feedback on L2 pronunciation errors in unpredictable sentence contexts. In Y. Choubsaz, B. Bédi, K. Friðriksdóttir, A. Gimeno-Sanz, S. Björg Vilhjálmsdóttir, S. Zahova (Eds.), CALL for all Languages – EUROCALL 2023 Short Papers.
Johnson, C. & Cardoso, W. (in press). Hey Google, let’s write: Examining L2 learners’ acceptance of automatic speech recognition as a writing tool. CALICO Journal.
Kennedy, S., & Trofimovich, P. (2008). Intelligibility, comprehensibility, and accentedness of L2 speech: The role of listener experience and semantic context. Modern Language Review,
64
(3), 459–489.
Khabbazbashi, N., Xu, J., & Galaczi, E. (2021). Opening the black box: Exploring Automated speaking evaluation. In B. Lanteigne, C. Coombe, & J. D. Brown (Eds.), Challenges in Language Testing Around the World (pp. 333–343). Springer.
Levis, J., & Suvorov, R. (2012). Automatic speech recognition. In C. Chapelle (Ed.), The encyclopedia of applied linguistics. John Wiley & Sons.
Liakin, D., Cardoso, W., & Liakina, N. (2014). Learning L2 pronunciation with a mobile speech recognizer: French/y/. CALICO Journal,
32
(1), 1–25.
Liakin, D., Cardoso, W., Liakina, N. (2017). Mobilizing instruction in a second-language context: learners’ perceptions of two speech technologies. Languages. 2017.
2
(3), 1–21.
Ling, G., Mollaun, P., & Xiaoming, X. (2014). A study on the impact of fatigue on human raters when scoring speaking responses. Language Testing, 31(4), 479–499.
Loukina, A., Davis, L., & Xi, X. (2018). Automated assessment of pronunciation in spontaneous speech. In O. Kang & A. Ginther (Eds.), Assessment in second language pronunciation (pp. 153–171). Routledge.
Luo, D., Minematsu, N., Yamauchi, Y., & Hiroshi, K. (2009). Analysis and comparison of automatic language proficiency assessment between shadowed sentences and read sentences. In Proceedings of SLaTE. ISCA. [URL].
McCrocklin, S. (2016). Pronunciation learner autonomy: The potential of automatic speech recognition. System,
57
1, 25–42.
McCrocklin, S. (2018). Learners’ feedback regarding ASR-based dictation practice for pronunciation learning. CALICO Journal,
36
(2), 119–137.
McCrocklin, S., Humaidan, A., & Edalatishams, E. (2019). ASR dictation program accuracy: Have current programs improved? In J. Levis, C. Nagle, & E. Todey (Eds.), Proceedings of the 10th Pronunciation in Second language learning and Teaching Conference, (pp. 191–200). Iowa State University.
McCrocklin, S., & Edalatishams, I. (2020). Revisiting popular speech recognition software for ESL speech. TESOL Quarterly,
54
(4), 1086–1097. 1–13.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18(2), 5–11.
Mroz, A. (2018). Seeing how people hear you: French learners experiencing intelligibility through automatic speech recognition. Foreign Language Annals,
51
(3), 617–637.
Saito, K., Trofimovich, P., Isaacs, T., & Webb, S. (2016). Re-examining phonological and lexical correlates of second language comprehensibility: The role of rater experience. In T. Isaacs, & P. Trofimovich (Eds.), Second language pronunciation assessment: Interdisciplinary perspectives (pp. 141–156). Multilingual Matters.
Saito, K., Macmillan, K., Kachlicka, M., Kunihara, T., & Minematsu, N. (2023). Automated assessment of second language comprehensibility: Review, training, validation, and generalization studies. Studies in Second Language Acquisition, 1–30.
Shahnawazuddin, S., Adiga, N., Kumar, K., Poddar, A., & Ahmad, W. (2020). Voice conversion based data augmentation to improve children’s speech recognition in limited data scenario. In Proceedings from INTERSPEECH, 2020 (pp. 4382–4386).
Tatman, R. (2017). Gender and dialect bias in Youtube’s automatic captions. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing (pp. 53–59). Association for Computational Linguistics.
Tatman, R., & Kasten, C. (2017). Effects of talker dialect, gender, and race on accuracy of Bing speech and Youtube automatic captions. In Proceedings from INTERSPEECH, 2017 (pp. 934–938). International Speech Communication Association.
Turner, C. (2014). Ratings Scales for Language Tests. In C. Chapelle (Ed.), The encyclopedia of applied linguistics. John Wiley & Sons.
van der Walt, C., de Wet, F., & Niesler, T. (2008). Oral proficiency assessment: The use of automatic speech recognition systems. Southern African Linguistics and Applied Language Studies, 26(1), 135–146.
Van Moere, A., & Suzuki, M. (2018). Using speech processing technology in assessing pronunciation. In O. Kang & A. Ginther (Eds.), Assessment in second language pronunciation (pp. 137–152). Routledge.
Wallace, L. (2015). Reflexive photography, attitudes, behavior, and CALL: ITAs improving spoken English intelligibility. CALICO Journal,
32
(3), 449–479.
Wallace, L. (2016). Using Google web speech as a springboard for identifying personal pronunciation problems. In J. Levis, H. Le, I. Lucic, E. Simpson, & S. Vo (Eds.). Proceedings of the 7th Pronunciation in Second Language Learning and Teaching Conference (pp. 180–186). Iowa State University. [URL]
Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A Framework for Evaluation and Use of Automated Scoring. Educational Measurement: Issues and Practice, 31(1), 2–13.
Winke, P., Gass, S., & Myford, C. (2012). Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing.
30
(2) 231–252.
Winke, P., & Gass, S. (2013). The influence of second language experience and accent familiarity on oral proficiency rating: A qualitative investigation. TESOL Quarterly,
47
(4), 762–789.
Yan, X., & Ginther, A. (2018). Listeners and raters: Similarities and differences in evaluation of accented speech. In O. Kang & A. Ginther (Eds.), Assessment in second language pronunciation (pp. 67–88). Routledge.
Zechner, K., Higgins, D., Xi, X., & Williamson, D. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication,
51
(10), 883–895.
Cited by (1)
Cited by one other publication
Semenova, Marina, Anastasia Atamas, Jiang Jinhui, E.N. Ponomareva, I.V. Tkacheva & A.R. Neidorf
2024.
RETRACTED: Phonetic articulation of Russian noise consonants produced by Chinese-speaking students.
BIO Web of Conferences 138
► pp. 04029 ff.
This list is based on CrossRef data as of 10 january 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.