Automatic subtitles increase accuracy and decrease cognitive load in simultaneous interpreting

Li, Tianyun; Chmiel, Agnieszka

doi:10.1075/intp.00111.li

Article published In:

Interpreting: Online-First Articles

Automatic subtitles increase accuracy and decrease cognitive load in simultaneous interpreting

Tianyun Li | Shandong University

Agnieszka Chmiel | Adam Mickiewicz University in Poznań

This study examines the effect of real-time subtitles generated by automatic speech recognition (ASR) technology on interpreting accuracy and interpreters’ cognitive load. Multiple measurements — including interpreting accuracy, the NASA-TLX for subjective ratings of cognitive load, eye-tracking and theta power as indicated by EEG recordings — were applied. Twenty-three professional simultaneous interpreters worked with a video recording of a speech presented in five conditions: a baseline without subtitles and then with subtitles of varying levels of precision (100%, 95%, 90% and 80%). The results reveal that the presence of subtitles significantly improved interpreting accuracy, with a suggested optimal precision rate of 90% or higher. The interpreters looked more at the subtitles, regardless of their level of precision, than the speaker. Contrary to our predictions, the presence of subtitles decreased, rather than increased, the cognitive load (although this outcome was shown by the EEG data only and not by the self-reported data). We conclude that the cognitive cost of processing subtitles as an additional information channel is offset by the cognitive gain achieved through visual prompting. The study highlights a complex effect of subtitles on interpreting, with such factors as subtitle presence and precision modulating the interpreters’ cognitive load in such a workflow.

Keywords: simultaneous interpreting, automatic speech recognition, ASR, live subtitling, cognitive load, electroencephalography, EEG

Article outline

1.Introduction
2.ASR subtitles in interpreting
3.Measurement of cognitive load in interpreting with ASR subtitles
4.The present study
- 4.1Participants
- 4.2Materials
- 4.3Procedure
5.Results
- 5.1Accuracy
- 5.2Self-reported cognitive load (NASA-TLX)
- 5.3Eye-tracking
- 5.4Theta power
6.Discussion
- 6.1Accuracy
- 6.2Self-reported cognitive load (NASA-TLX)
- 6.3Eye-tracking
- 6.4Theta power
7.Conclusions
Notes
References

Published online: 16 September 2024

https://doi.org/10.1075/intp.00111.li

References (74)

References

Abidi, O., Dženopoljac, V. & Safi, M. (2023). Online meeting tools, tacit knowledge sharing and entrepreneurial behaviours among knowledge workers during COVID-19. Knowledge Management Research & Practice 21 (6), 1137–1149.

Albl-Mikasa, M. (2010). Global English and English as a lingua franca (ELF): Implications for the interpreting profession. Trans-Kom 3 (2), 126–148.

Alexander, M. P., Benson, D. F. & Stuss, D. T. (1989). Frontal lobes and language. Brain and Language 37 (4), 656–691.

Amankwah-Amoah, J., Khan, Z., Wood, G. & Knight, G. (2021). COVID-19 and digitalization: The great acceleration. Journal of Business Research 136 1, 602–611.

Baranowska, K. (2020). Learning most with least effort: Subtitles and cognitive load. ELT Journal 74 (2), 105–115.

Boos, M., Kobi, M., Elmer, S. & Jäncke, L. (2022). The influence of experience on cognitive load during simultaneous interpretation. Brain and Language 234 1, 105185.

Castro-Meneses, L. J., Kruger, J.-L. & Doherty, S. (2020). Validating theta power as an objective measure of cognitive load in educational video. Educational Technology Research and Development 68 (1), 181–202.

Chen, S. (2017). The construct of cognitive load in interpreting and its measurement. Perspectives 25 (4), 640–657.

Cheung, A. K. F. (2008). Simultaneous interpreting of numbers: An experimental study. Forum 6 (2), 23–38.

Cheung, A. K. F. & Li, T. (2022). Machine aided interpreting: An experiment of automatic speech recognition in simultaneous interpreting. Translation Quarterly 104 (2), 1–20.

Chiu, C.-C., Sainath, T. N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R. J., Rao, K., Gonina, E., Jaitly, N., Li, B., Chorowski, J. & Bacchiani, M. (2018). State-of-the-art speech recognition with sequence-to-sequence models. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4774–4778.

Chmiel, A., Janikowski, P. & Lijewska, A. (2020). Multimodal processing in simultaneous interpreting with text: Interpreters focus more on the visual than the auditory modality. Target 32 (1), 37–58.

CSIS. (2017, 9 September). Raila Odinga on the Kenyan elections. [URL]

D’Ausilio, A., Craighero, L. & Fadiga, L. (2012). The contribution of the frontal lobe to the perception of speech. Journal of Neurolinguistics 25 (5), 328–335.

Defrancq, B. & Fantinuoli, C. (2021). Automatic speech recognition in the booth: Assessment of system performance, interpreters’ performances and interactions in the context of numbers. Target 33 (1), 73–102.

Dehaene, S., Cohen, L., Sigman, M. & Vinckier, F. (2005). The neural code for written words: A proposal. Trends in Cognitive Sciences 9 (7), 335–341.

Delorme, A. & Makeig, S. (2004). EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods 134 (1), 9–21.

Desmet, B., Vandierendonck, M. & Defrancq, B. (2018). Simultaneous interpretation of numbers and the impact of technological support. In C. Fantinuoli (Ed.), Interpreting and technology. Berlin: Language Science Press, 13–27.

Díaz-Cintas, J. (2020). The name and nature of subtitling. In Ł. Bogucki & M. Deckert (Ed.), The Palgrave handbook of audiovisual translation and media accessibility. Cham: Palgrave Macmillan, 149–171.

Díaz Cintas, J. & Remael, A. (2014). Audiovisual translation: Subtitling. Abingdon: Routledge.

ritella, F. M. (2021). CAI tool-supported SI of numbers: A theoretical and methodological contribution. International Journal of Interpreter Education 14 (1), 32–56.

Fujimoto, M. & Kawai, H. (2019). One-pass single-channel noisy speech recognition using a combination of noisy and enhanced features. Interspeech 2019, 486–490.

Fuster, J. M. (2015). The prefrontal cortex (5th ed.). London: Academic Press.

Gevins, A. & Smith, M. E. (2003). Neurophysiological measures of cognitive workload during human-computer interaction. Theoretical Issues in Ergonomics Science 4 (1–2), 113–131.

Gile, D. (1999). Testing the Effort Model’s tightrope hypothesis in simultaneous interpreting — A contribution. Hermes 12 (23), 153–172.

(2009). Basic concepts and models for interpreter and translator training (revised edition). Amsterdam: John Benjamins.

Grabner, R. H., Brunner, C., Leeb, R., Neuper, C. & Pfurtscheller, G. (2007). Event-related EEG theta and alpha band oscillatory responses during language translation. Brain Research Bulletin 72 (1), 57–65.

Hart, S. G. (2006). Nasa-Task Load Index (NASA-TLX); 20 years later. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 50 (9), 904–908.

Johnson, E. B., Rees, E. M., Labuschagne, I., Durr, A., Leavitt, B. R., Roos, R. A. C., Reilmann, R., Johnson, H., Hobbs, N. Z., Langbehn, D. R., Stout, J. C., Tabrizi, S. J. & Scahill, R. I. (2015). The impact of occipital lobe cortical thickness on cognitive task performance: An investigation in Huntington’s Disease. Neuropsychologia 79 1, 138–146.

Kafle, S. & Huenerfauth, M. (2016). Effect of speech recognition errors on text understandability for people who are deaf or hard of hearing. The 7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2016), 20–25.

Kalina, S. (1992). Discourse processing and interpreting strategies — An approach to the teaching of interpreting. In C. Dollerup & A. Loddegaard (Ed.), Teaching translation and interpreting. Amsterdam: John Benjamins, 251–257.

Klimesch, W., Schack, B. & Sauseng, P. (2005). The functional significance of theta and upper alpha oscillations. Experimental Psychology 52 (2), 99–108.

Lee, S.-B. (2018). Exploring a relationship between students’ interpreting self-efficacy and performance: Triangulating data on interpreter performance assessment. The Interpreter and Translator Trainer 12 (2), 166–187.

Lemhöfer, K. & Broersma, M. (2012). Introducing LexTALE: A quick and valid Lexical Test for Advanced Learners of English. Behavior Research Methods 44 (2), 325–343.

Lenzo, K. (1993, 16 September). The CMU pronouncing dictionary. [URL]

Liao, S., Kruger, J.-L. & Doherty, S. (2020). The impact of monolingual and bilingual subtitles on visual attention, cognitive load, and comprehension. The Journal of Specialised Translation 33 (1), 70–98.

Lin, X. (2013). An empirical study on computer aided interpretation from English to Chinese Master’s thesis, Shandong Normal University. [URL]

Locke, E. A., Frederick, E., Lee, C. & Bobko, P. (1984). Effect of self-efficacy, goals, and task strategies on task performance. Journal of Applied Psychology 69 (2), 241–251.

Ludersdorfer, P., Kronbichler, M. & Wimmer, H. (2015). Accessing orthographic representations from speech: The role of left ventral occipitotemporal cortex in spelling. Human Brain Mapping 36 (4), 1393–1406.

Mackintosh, J. (2003). The AIIC workload study. Forum 1 (2), 189–214.

Malakul, S. & Park, I. (2023). The effects of using an auto-subtitle system in educational videos to facilitate learning for secondary school students: Learning comprehension, cognitive load, and satisfaction. Smart Learning Environments 10 (1), 4.

Mellinger, C. D. & Hanson, T. A. (2024, 15 June). Cognitive load scales in CTIS: A systematic review. The Third Meeting of the Bertinoro Translation Society (BTS3), Bertinoro, Italy.

Mognon, A., Jovicich, J., Bruzzone, L. & Buiatti, M. (2011). ADJUST: An automatic EEG artifact detector based on the joint use of spatial and temporal features. Psychophysiology 48 (2), 229–240.

Nacimiento-García, E., González-González, C. S. & Gutiérrez-Vela, F. L. (2023). Automatic captions on video calls: A must for the older adults. Universal Access in the Information Society.

Nomura, S., Mizuno, T., Nozawa, A., Asano, H. & Ide, H. (2009). Salivary cortisol as a new biomarker for a mild mental workload. 2009 International Conference on Biometrics and Kansei Engineering, 127–131.

Orken, M., Dina, O., Keylan, A., Tolganay, T. & Mohamed, O. (2022). A study of transformer-based end-to-end speech recognition system for Kazakh language. Scientific Reports 12 (1), 8337.

O’Sullivan, C. & Cornu, J.-F. (2018). History of audiovisual translation. In L. Pérez-González (Ed.), The Routledge handbook of audiovisual translation (pp. 15–30). Abingdon: Routledge.

Paas, F. G. W. C. (1992). Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive-load approach. Journal of Educational Psychology 84 (4), 429–434.

Pisani, E. & Fantinuoli, C. (2021). Measuring the impact of automatic speech recognition on number rendition in simultaneous interpreting. In B. Zheng & C. Wang (Ed.), Empirical studies of translation and interpreting: The post-structuralist approach. Abingdon: Routledge, 181–197.

Pöchhacker, F. (2004). Introducing interpreting studies. London/New York: Routledge.

Prandi, B. (2018). An exploratory study on CAI tools in simultaneous interpreting: Theoretical framework and stimulus validation. In C. Fantinuoli (Ed.), Interpreting and technology. Berlin: Language Science Press, 29–59.

Puma, S., Matton, N., Paubel, P.-V., Raufaste, É. & El-Yagoubi, R. (2018). Using theta and alpha band power to assess cognitive workload in multitasking environments. International Journal of Psychophysiology 123 1, 111–120.

R Core Team. (2020). R: A language and environment for statistical computing. [URL]

Rinne, J. O., Tommola, J., Laine, M., Krause, B. J., Schmidt, D., Kaasinen, V., Teräs, M., Sipilä, H. & Sunnari, M. (2000). The translating brain: Cerebral activation patterns during simultaneous interpreting. Neuroscience Letters 294 (2), 85–88.

Romero-Fresco, P. & Eugeni, C. (2020). Live subtitling through respeaking. In Ł. Bogucki & M. Deckert (Ed.), The Palgrave handbook of audiovisual translation and media accessibility. Cham: Palgrave Macmillan, 269–295.

Scott, B. (2003). Automatic readability checker. Readability Formulas. [URL]

Seeber, K. G. (2011). Cognitive load in simultaneous interpreting: Existing theories — new models. Interpreting 13 (2), 176–204.

(2017). Multimodal processing in simultaneous interpreting. In J. W. Schwieter & A. Ferreira (Ed.), The handbook of translation and cognition. Hoboken, NJ: Wiley, 461–475.

Seeber, K. G., Keller, L. & Hervais-Adelman, A. (2020). When the ear leads the eye — the use of text during simultaneous interpretation. Language, Cognition and Neuroscience 35 (10), 1480–1494.

Setton, R. (1999). Simultaneous interpretation: A cognitive-pragmatic analysis. Amsterdam: John Benjamins.

Stone, J. V. (2002). Independent component analysis: An introduction. Trends in Cognitive Sciences 6 (2), 59–64.

Sun, H., Li, K. & Lu, J. (2021). AI-assisted simultaneous interpreting: An experiment and its implications. Technology Enhanced Foreign Language Education 06 1, 75–80+86+12.

Szarkowska, A. & Gerber-Morón, O. (2019). Two or three lines: A mixed-methods study on subtitle processing and preferences. Perspectives 27 (1), 144–164.

The BBC Academy. (2022, July). BBC subtitle guidelines. [URL]

Tran, Y., Craig, A., Craig, R., Chai, R. & Nguyen, H. (2020). The influence of mental fatigue on brain activity: Evidence from a systematic review with meta-analyses. Psychophysiology 57 (5).

Van Rossum, G. & Drake, F. L. (2009). Python 3 reference manual. CreateSpace.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. & Polosukhin, I. (2017). Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010.

Wang, C., Wu, Y., Lu, L., Liu, S., Li, J., Ye, G. & Zhou, M. (2020). Low latency end-to-end streaming speech recognition with a scout network. Interspeech 2020, 2112–2116.

Weiss, S. & Mueller, H. M. (2003). The contribution of EEG coherence to the investigation of language. Brain and Language 85 (2), 325–343.

Williams, N. S., McArthur, G. M., de Wit, B., Ibrahim, G. & Badcock, N. A. (2020). A validation of Emotiv EPOC Flex saline for EEG and ERP research. PeerJ 8 1, e9713.

Yuan, L. & Wang, B. (2023). Cognitive processing of the extra visual layer of live captioning in simultaneous interpreting: Triangulation of eye-tracked process and performance data. Ampersand 11 1, 100131.

Zekveld, A. A., Kramer, S. E., Kessens, J. M., Vlaming, M. S. M. G. & Houtgast, T. (2009). The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system. Ear & Hearing 30 (2), 262–272.

Zhang, Y., Qin, J., Park, D. S., Han, W., Chiu, C.-C., Pang, R., Le, Q. V. & Wu, Y. (2020). Pushing the limits of semi-supervised learning for automatic speech recognition.

Zhang, Z. (2019). Spectral and time-frequency analysis. In L. Hu & Z. Zhang (Ed.), EEG signal processing and feature extraction. Singapore: Springer, 89–116.