This paper reports on the role of technology in state-of-the-art pronunciation research and instruction, and makes concrete
suggestions for future developments. The point of departure for this contribution is that the goal of second language (L2)
pronunciation research and teaching should be enhanced comprehensibility and intelligibility as opposed to native-likeness. Three
main areas are covered here. We begin with a presentation of advanced uses of pronunciation technology in research with a special
focus on the expertise required to carry out even small-scale investigations. Next, we discuss the nature of data in pronunciation
research, pointing to ways in which future work can build on advances in corpus research and crowdsourcing. Finally, we consider
how these insights pave the way for researchers and developers working to create research-informed, computer-assisted
pronunciation teaching resources. We conclude with predictions for future developments.
Abel, J., Allen, B., Burton, S., Kazama, M., Noguchi, M., Tsuda, A., Yamane, N., & Gick, B. (2015). Ultrasound-enhanced multimodal approaches to pronunciation teaching and learning. Proceedings of acoustics week in Canada. Canadian Acoustics, 43(3), 124–125.
Abercrombie, D. (1949). Teaching pronunciation. English Language Teaching, 31, 113–122.
Ballier, N., & Martin, P. (2016). Speech annotation of learner corpora. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Cambridge handbook of learner corpus research (pp. 107–134). Cambridge: Cambridge University Press.
Baker, A. (2014). Exploring teachers’ knowledge of second language pronunciation techniques: Teacher cognitions, observed classroom practices, and student perceptions. TESOL Quarterly, 481, 136–163.
Boersma, P. & Weenink, D. (2017). Praat: doing phonetics by computer [Computer program]. Version 6.0.22. Retrieved from <[URL]> (15November 2016).
Bueno Alastuey, M. C. (2010). Synchronous-voice computer-mediated communication: Effects on pronunciation. CALICO Journal, 28(1), 1–20.
Catford, J. C. (1987). Phonetics and the teaching or pronunciation. In J. Morley (Ed.), Current perspectives on pronunciation: Practices anchored in theory (pp. 87–100). Alexandria, VA: TESOL.
Chun, D. M. (2013). Computer-assisted pronunciation teaching. In C. A. Chapelle (Ed.), Encyclopedia of applied linguistics (pp. 823–834). Malden, MA: Wiley-Blackwell.
Cooke, M., Barker, J., & Lecumberri, M. L. G. (2013). Crowdsourcing in speech perception. In M. Eskenazi, G. -A. Levow, H. Meng, G. Parent, & D. Suendermann (Eds.), Crowdsourcing for speech processing: Applications to data collection, transcription and assessment (pp. 137–172). Chichester: Wiley & Sons.
Cucchiarini, C., & Strik, H. (2018). Automatic speech recognition for second language pronunciation assessment and training. In O. Kang, R. I. Thomson, & M. J. Murphy (Eds.), pp. 556–569. The Routledge handbook of English pronunciation. London: Routledge.
Cucchiarini, C., Neri, A., & Strik, H. (2009). Oral proficiency training in Dutch L2: The contribution of ASR-based corrective feedback. Speech Communication, 51(10), 853–863.
Cucchiarini, C., Strik, H. & Boves, L. (2000a). Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithm. Speech Communication, 30(2–3), 109–119.
Cucchiarini, C., Strik, H., & Boves, L. (2000b). Quantitative assessment of second language learners’ fluency. Journal of the Acoustical Society of America, 107(2), 989–999.
Cucchiarini, C., Strik, H. & Boves, L. (2002). Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech. Journal of the Acoustical Society of America, 111(6), 2862–2873.
Cucchiarini, C., Driesen, J., Van Hamme, H., & Sanders, E. (2008). Recording speech of children, non-natives and elderly people for HLT applications: The JASMIN-CGN corpus. Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (pp. 1445–1450).
Darcy, I., Ewert, D., & Lidster, R. (2012). Bringing pronunciation instruction back into the classroom: An ESL teachers’ pronunciation “toolbox”. In. J. Levis & K. LeVelle (Eds.), Proceedings of the 3rd Pronunciation in Second Language Learning and Teaching Conference, Sept. 2011 (pp. 93–108). Ames, IA: Iowa State University.
Derwing, T. M., & Munro, M. J. (2005). Second language accent and pronunciation teaching: A research-based approach. TESOL Quarterly, 391, 379–397.
Do, H., Hussein, H., Mixdorff, H., Jokisch, O., Ding, H., Gao, Q., Wei, S. and Hu, G. (2012). Evaluation of benefits from a computer-aided pronunciation training system for German learners of Mandarin Chinese. Proceedings of Speech Prosody 2012 (pp. 362–365). Shanghai, China.
Durand, J., Gut, U., & Kristofferson, G. (Eds.). (2014). Handbook of corpus phonology. Oxford: Oxford University Press.
Eskenazi, M., (2013). The basics. In M. Eskenazi, G. -A. Levow, H. Meng, G. Parent, & D. Suendermann (Eds.), Crowdsourcing for speech processing: Applications to data collection, transcription and assessment (pp. 11–33). Chichester: Wiley & Sons.
Field, J. (2005). Intelligibility and the listener: The role of lexical stress. TESOL Quarterly, 39(3), 399–423.
Foote, J. A., Holtby, A. K., & Derwing, T. M. (2011). Survey of the teaching of pronunciation in adult ESL programs in Canada, 2010. TESL Canada Journal, 29(1), 1–22.
Foote, J., & Smith, G. (2013, September). Is there an app for that? Paper presented at the 5th Pronunciation in Second Language Learning and Teaching Conference, Ames, IA.
Fujisaki, H. & Hirose, K. (1984). Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustical Society of Japan, 5(4), 233–241.
Gilquin, G. (2015). From design to collection of learner corpora. In S. Grainger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 9–34). Cambridge: Cambridge University Press.
Granger, S., Gilquin, G., & Meunier, F. (2016). Introduction: Learner corpus research – past, present and future. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Cambridge handbook of learner corpus research (pp. 1–5). Cambridge: Cambridge University Press.
Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38(2), 201–223.
Hahn, M. K. (2002). The persistence of learned primary phrase stress patterns among learners of English (Unpublished doctoral dissertation). University of Illinois, Urbana-Champaign.
Hardison, D. M. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning & Technology, 8(1), 34–52. Retrieved from <[URL]>
Hardison, D. M. (2016, August). Visualizing the gestural and prosodic components of emphasis in multimodal discourse. Paper presented at the International Roundtable on The Role of Technology in L2 Pronunciation Research and Teaching, University of Calgary, Canada.
Hilbert, A., Mixdorff, H., Ding, H., Pfizinger, H., & Jokisch, O. (2010). Prosodic analysis of accented German by Russian and Chinese learners. Proceedings of Speech Prosody 2010, Chicago, IL.
Hilbert, A., & Mixdorff, H. (2011). Weiterentwicklung eines Sprachsynthesesystems. In G. Görlitz (Ed.), Nachhaltige Forschung in Wachstumsbereichen Band I (pp. 35–42). Berlin: Logos Verlag.
Hu, W., Qian, Y., Soong, F. K., & Wang, Y. (2015). Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers. Speech Communication, 671, 154–166.
Hussein, H., Do, H. S., Mixdorff, H., Ding, H., Gao, Q., Hu, G., Wei, S., & Chao, Z. (2011). Mandarin tone perception and production by German learners. Proceedings of the Workshop on Speech and Language Technology in Education (SLaTE), Venice, Italy.
Ingram, J., Mixdorff, H., & Kwon, N., (2009). Voice morphing and the manipulation of intra-speaker and cross-speaker phonetic variation to create foreign accent continua: A perceptual study. Proceedings of the Workshop on Speech and Language Technology in Education (SLaTE), Wroxall Abbey, England.
Kipp, M. (2001). Anvil – A generic annotation tool for multimodal dialogue. Proceedings of the 7th European Conference on Speech Communication and Technology (pp. 1367–1370). Aalborg, Denmark: Eurospeech. Available at <[URL]>
Kipp, M. (2014). ANVIL: A universal video research tool. In J. Durand, U. Gut, & G. Kristofferson (Eds.), Handbook of corpus phonology (pp. 420–436). Oxford: Oxford University Press.
Lee, J., Jang, J., & Plonsky, L. (2015). The effectiveness of second language pronunciation instruction: A meta-analysis. Applied Linguistics, 36(3), 345–366.
Levis, J. M. (2005). Changing contexts and shifting paradigms in pronunciation teaching. TESOL Quarterly, 39(3), 369–377.
Levis, J. (2007). Computer technology in teaching and researching. Annual Review of Applied Linguistics, 271, 184–202.
Lippi-Green, R. (2012). English with an accent: Language, ideology, and discrimination in the United States (2nd ed.). London: Routledge.
Liu, X., Deng, E., Liu, S., et al. (Eds.) (1981). Shíyòng Hànyŭ Kèbĕn Dì Yī Cè实用汉语课本第一册 [Practical Chinese Reader, Book I] (pp. i–viii). Beijing: Shangwu yinshuguan (The Commercial Press).
Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). Training Japanese listeners to identify English /r/ and /l/. II: The role of phonetic environment and talker variability in learning new perceptual categories. Journal of the Acoustical Society of America, 961, 2076–2087.
Mackey, A., & Gass, S. (2005). Second language research: Methodology and design. Mahwah, NJ: Lawrence Erlbaum Associates. .
MacWhinney, B. (2000). The CHILDES Project: Tools for analyzing talk (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. . Retrieved from <[URL]>
Mixdorff, H., & Ingram, J. (2009). Prosodic analysis of foreign-accented English. Proceedings of Interspeech, Brighton, UK.
Mixdorff, H., Külls, D., Hussein, H., Shu, G., Guoping, H., & Si, W. (2009). Towards a computer-aided pronunciation training system for German learners of Mandarin. In Proceedings of the Workshop on Speech and Language Technology in Education (SLaTE), Wroxall Abbey, Warwickshire, UK.
Mixdorff, H., & Munro, M. J. (2013). Quantifying and evaluating the impact of prosodic differences of foreign-accented English. Proceedings of the Workshop on Speech and Language Technology in Education (SLaTE). Grenoble, France.
Motohashi-Saigo, M., & Hardison, D. M. (2009). Acquisition of L2 Japanese geminates: Training with waveform displays. Language Learning & Technology, 13(2), 29–47. Retrieved from <[URL]>
Munro, M. J., & Derwing, T. M. (2006). The functional load principle in ESL pronunciation instruction: An exploratory study. System, 341, 520–531.
Munro, M. J., Derwing, T. M., & Thomson, R. I. (2015). Setting segmental priorities for English learners: Evidence from a longitudinal study. International Review of Applied Linguistics in Language Teaching, 53(1), 39–60.
Murphy, J. (1997). Phonology courses offered by MATESOL programs in the US. TESOL Quarterly, 311, 741–764.
Neri, A., Cucchiarini, C., Strik, H., & Boves, L. (2002). The pedagogy-technology interface in computer assisted pronunciation training. Computer Assisted Language Learning, 15(5), 441–467.
Neumeyer, L., Franco, H., Digalakis, V., & Weintraub, M. (2000). Automatic scoring of pronunciation quality. Speech Communication, 30(2), 83–93.
O’Brien, M. G. (2011). Teaching and assessing pronunciation with computer technology. In N. Arnold & L. Ducate (Eds.), Present and Future Promises of CALL: From Theory and Research to New Directions in Language Teaching (2nd ed.) (pp. 375–406). San Marcos, TX: CALICO Monograph Series.
Okuno, T., & Hardison, D. M. (2016). Perception-production link in L2 Japanese vowel duration: Training with technology. Language Learning & Technology, 201, 61–80. Retrieved from <[URL]>
Olson, D. J. (2014). Benefits of visual feedback on segmental production in the L2 classroom. Language Learning & Technology, 18(3), 173–192. Retrieved from <[URL]>
Pennington, M. C. (1999). Computer-aided pronunciation pedagogy: Promise, limitations, directions. Computer Assisted Language Learning, 12(5), 427–440.
Pennington, M. C., & Ellis, N. C. (2000). Cantonese speakers’ memory for English sentences with prosodic cues. The Modern Language Journal, 84(3), 372–389.
Qian, M., Chukharev-Hudalainen, E., & Levis, J. (2018). A system for adaptive high-variability segmental-perceptual training: Implementation, effectiveness, and transfer. Language Learning and Technology, (221), 69–96.
Qian, X., Meng, H., Soong, F. (2012). The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 English to support computer-aided pronunciation training. Proceedings of Interspeech 2012 (pp. 775–778), Portland, OR.
Rose, Y., & MacWhinney, B. (2014). The PhonBank project: Data and software-assisted methods for the study of phonology and phonological development. In J. Durand, U. Gut, & G. Kristoffersen (Eds.), The Oxford handbook of corpus phonology (pp. 308–401). Oxford: Oxford University Press.
Smith, B. L., & Hayes-Harb, R. (2011). Individual differences in the perception of final consonant voicing among native and non-native speakers of English. Journal of Phonetics, 391, 115–120.
Staples, S. (2015). Spoken corpora. In D. Biber & R. Reppen (Eds.), The Cambridge handbook of English corpus linguistics (pp. 271–291). Cambridge: Cambridge University Press.
Strik, H. (2012). ASR-based systems for language learning and therapy. International Symposium on Automatic Detection of Errors in Pronunciation Training (IS-Adept). KTH, Stockholm, Sweden, 6–8June.
Strik, H., Colpaert, J., Van Doremalen, J., & Cucchiarini, C. (2012). The DISCO ASR-based CALL system: Practicing L2 oral skills and beyond. Proceedings of the Conference on International Language Resources and Evaluation (LREC 2012), Istanbul, May.
Strik, H., & Cucchiarini, C. (2014). On automatic phonological transcription of speech corpora. In J. Durand, U. Gut, & G. Kristofferson (Eds.), The Oxford handbook of corpus phonology. Oxford: Oxford University Press.
Strik, H., Truong, K., de Wet, F., & Cucchiarini, C. (2009). Comparing different approaches for automatic pronunciation error detection. Speech Communication, 51(10), 845–852.
Sweet, H. (1900). The practical study of languages: A guide for teachers and learners. New York, NY: Henry Holt & Co.
Thomson, R. I. (2011). Computer assisted pronunciation training: Targeting second language vowel perception improves pronunciation. CALICO Journal, 281, 744–765.
Thomson, R. I. (2016). Does training to perceive L2 English vowels in one phonetic context transfer to other phonetic contexts? Proceedings of the annual conference of the Canadian Acoustics Association. Canadian Acoustics, 44(3), 198–199.
Thomson, R. I. (2018). English Accent Coach [Computer program]. Version 2.3. Retrieved from <[URL]>
Thomson, R. I., & Derwing, T. M. (2015). The effectiveness of L2 pronunciation instruction: A narrative review. Applied Linguistics, 36(3), 326–344.
Thomson, R. I., & Derwing, T. M. (2016). Is phonemic training using nonsense or real words more effective? In J. Levis, H. Le, I. Lucic, E. Simpson, & S. Vo (Eds.). Proceedings of the 7th Pronunciation in Second Language Learning and Teaching Conference, Oct. 2015. (pp. 88–97). Ames, IA: Iowa State University.
Trouvain, J., & Gut, U. (Eds.) (2007). Non-native prosody: Phonetic description and teaching practice. Berlin: Mouton de Gruyter.
Van Doremalen, J. (2014). Developing automatic speech recognition-enabled language learning applications: from theory to practice. Evaluating automatic speech recognition-based language learning systems: a case study (Unpublished PhD dissertation). Radboud University, Nijmegen.
Van Doremalen, J., Boves, L., Colpaert, J., Cucchiarini, C., & Strik, H. (2016). Evaluating automatic speech recognition-based language learning systems: A case study. Computer Assisted Language Learning, 29(4), 833–851.
Van Doremalen, J., Cucchiarini, C., & Strik, H. (2010). Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP Journal on Audio, Speech, and Music Processing 2009.
Van Doremalen, J., Cucchiarini, C., & Strik, H. (2013). Automatic pronunciation error detection in non-native speech: the case of vowel errors in Dutch. Journal of the Acoustical Society of America, 1341, 1336–1347.
Weinberger, S. H. (2017). Speech Accent Archive. George Mason University. Retrieved from <[URL]>
Witt, S., & Young, S. (2000). Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication, 30(2/3): 95–108.
Zielinski, B. (2008). The listener: No longer the silent partner in reduced intelligibility. System, 361, 69–84.
Cited by (33)
Cited by 33 other publications
Mahmood, Rizgar Qasim
2024. The Impact of Visual Corrective Feedback on Pronunciation Accuracy in L2 Sound Production. In Exploring Contemporary English Language Education Practices [Advances in Educational Technologies and Instructional Design, ], ► pp. 158 ff.
Mahmood, Rizgar Qasim
2024. EFL Learners' Perspectives on Online Pronunciation Instruction. In Teacher and Student Perspectives on Bilingual and Multilingual Education [Advances in Educational Technologies and Instructional Design, ], ► pp. 1 ff.
Mahmood, Rizgar Qasim & Hung Phu Bui
2024. EFL Learners’ Perceptions of Pronunciation Corrective Feedback: Insights from Synchronous High Variability Phonetic Training. In Innovations in Technologies for Language Teaching and Learning [Studies in Computational Intelligence, 1159], ► pp. 95 ff.
Sun, Yan
2024. The Application of Intelligent Speech Recognition in the Teaching of Spoken English in Colleges and Universities. Applied Mathematics and Nonlinear Sciences 9:1
Ali, Saandia, Marie Garnier & Linda Terrier
2023. Vers une cartographie systématique en diachronie de la recherche en prononciation de l’anglais L2. Anglophonia 36
Hirai, Akiyo & Angelina Kovalyova
2023. Using Speech-to-Text Applications for Assessing English Language Learners’ Pronunciation: A Comparison with Human Raters. In Optimizing Online English Language Learning and Teaching [English Language Education, 31], ► pp. 337 ff.
Huang, Guanyu & Roger K. Moore
2023. Using social robots for language learning: are we there yet?. Journal of China Computer-Assisted Language Learning 3:1 ► pp. 208 ff.
2023. Automated assessment of second language comprehensibility: Review, training, validation, and generalization studies. Studies in Second Language Acquisition 45:1 ► pp. 234 ff.
Sun, Weina
2023. The impact of automatic speech recognition technology on second language pronunciation and speaking skills of EFL learners: a mixed methods investigation. Frontiers in Psychology 14
Sweeting, Arizio M. & Michael D. Carey
2023. What pronunciation specialists believe CELTA tutors need to know to prepare student teachers to teach pronunciation. International Review of Applied Linguistics in Language Teaching
Vančová, Hana
2023. AI and AI-powered tools for pronunciation training. Journal of Language and Cultural Education 11:3 ► pp. 12 ff.
Chun, Dorothy M. & Yan Jiang
2022. Using Technology to Explore L2 Pronunciation. In Second Language Pronunciation, ► pp. 129 ff.
Gómez-Lacabex, Esther, Francisco Gallardo-del-Puerto & Jian Gong
2022. Synchronous computer-mediated communication in pronunciation teaching: A survey study. Revista de Estilos de Aprendizaje 15:Especial ► pp. 123 ff.
2022. The English pronunciation of Arabic speakers: A data-driven approach to segmental error identification. Language Teaching Research 26:6 ► pp. 1055 ff.
Trouvain, Jürgen
2022. Das IFCASL-Korpus als phonetisches Lernerkorpus. Zeitschrift für germanistische Linguistik 50:1 ► pp. 82 ff.
Zhu, Shan & Sheng Bin
2022. Application of Ontology Matching Algorithm Based on Linguistic Features in English Pronunciation Quality Evaluation. Occupational Therapy International 2022 ► pp. 1 ff.
O’Brien, Mary Grantham
2021. Ease and Difficulty in L2 Pronunciation Teaching: A Mini-Review. Frontiers in Communication 5
Papin, Kevin
2021. « Avez-vous la carte de points? » : soutenir la volonté de communiquer à l’oral grâce à des tâches de simulation en ligne. La Revue de l’AQEFLS: Revue de l’Association québécoise des enseignants de français langue seconde 34:2
Setter, Jane & Takehiko Makino
2021. Pronunciation Teaching. In The Cambridge Handbook of Phonetics, ► pp. 527 ff.
2020. Fuzzy Logic Applied for Pronunciation Assessment. International Journal of Computer-Assisted Language Learning and Teaching 10:1 ► pp. 60 ff.
Dendani, Bilal, Halima Bahi & Toufik Sari
2020. Speech Enhancement Based on Deep AutoEncoder for Remote Arabic Speech Recognition. In Image and Signal Processing [Lecture Notes in Computer Science, 12119], ► pp. 221 ff.
Tejedor-Garcia, Cristian, David Escudero-Mancebo, Valentin Cardenoso-Payo & Cesar Gonzalez-Ferreras
2020. Using Challenges to Enhance a Learning Game for Pronunciation Training of English as a Second Language. IEEE Access 8 ► pp. 74250 ff.
Derwing, Tracey M. & Ronald I. Thomson
2019. Reflections on the Development of L2 Pronunciation Research in Canada. The Canadian Modern Language Review 75:4 ► pp. 329 ff.
Hardison, Debra M.
2019. Technology-Based Communication Success for Second-Language Learners. In Encyclopedia of Educational Innovation, ► pp. 1 ff.
Hardison, Debra M.
2020. Technology-Based Communication Success for Second-Language Learners. In Encyclopedia of Educational Innovation, ► pp. 1 ff.
Hardison, Debra M.
2021. Multimodal input in second-language speech processing. Language Teaching 54:2 ► pp. 206 ff.
Henrichsen, Lynn
2019. Teaching and Learning Second-Language Pronunciation Using Online Resources. In Encyclopedia of Educational Innovation, ► pp. 1 ff.
Henrichsen, Lynn
2020. Teaching and Learning Second-Language Pronunciation Using Online Resources. In Encyclopedia of Educational Innovation, ► pp. 1 ff.
Tsai, Pi-hua
2019. Beyond self-directed computer-assisted pronunciation learning: a qualitative investigation of a collaborative approach. Computer Assisted Language Learning 32:7 ► pp. 713 ff.
2019. 2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), ► pp. 1 ff.
Levis, John M.
2018. Plenary talk. Journal of Second Language Pronunciation 4:2 ► pp. 260 ff.
This list is based on CrossRef data as of 20 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.