Article published In:
Interpreting: Online-First ArticlesAutomatic subtitles increase accuracy and decrease cognitive load in simultaneous interpreting
This study examines the effect of real-time subtitles generated by automatic speech recognition (ASR) technology
on interpreting accuracy and interpreters’ cognitive load. Multiple measurements — including interpreting accuracy, the NASA-TLX
for subjective ratings of cognitive load, eye-tracking and theta power as indicated by EEG recordings — were applied. Twenty-three
professional simultaneous interpreters worked with a video recording of a speech presented in five conditions: a baseline without
subtitles and then with subtitles of varying levels of precision (100%, 95%, 90% and 80%). The results reveal that the presence of
subtitles significantly improved interpreting accuracy, with a suggested optimal precision rate of 90% or higher. The interpreters
looked more at the subtitles, regardless of their level of precision, than the speaker. Contrary to our predictions, the presence
of subtitles decreased, rather than increased, the cognitive load (although this outcome was shown by the EEG data only and not by
the self-reported data). We conclude that the cognitive cost of processing subtitles as an additional information channel is
offset by the cognitive gain achieved through visual prompting. The study highlights a complex effect of subtitles on
interpreting, with such factors as subtitle presence and precision modulating the interpreters’ cognitive load in such a
workflow.
Keywords: simultaneous interpreting, automatic speech recognition, ASR, live subtitling, cognitive load, electroencephalography, EEG
Article outline
- 1.Introduction
- 2.ASR subtitles in interpreting
- 3.Measurement of cognitive load in interpreting with ASR subtitles
- 4.The present study
- 4.1Participants
- 4.2Materials
- 4.3Procedure
- 5.Results
- 5.1Accuracy
- 5.2Self-reported cognitive load (NASA-TLX)
- 5.3Eye-tracking
- 5.4Theta power
- 6.Discussion
- 6.1Accuracy
- 6.2Self-reported cognitive load (NASA-TLX)
- 6.3Eye-tracking
- 6.4Theta power
- 7.Conclusions
- Notes
-
References
Published online: 16 September 2024
https://doi.org/10.1075/intp.00111.li
https://doi.org/10.1075/intp.00111.li
References (74)
Abidi, O., Dženopoljac, V. & Safi, M. (2023). Online meeting tools, tacit knowledge sharing and entrepreneurial behaviours among knowledge workers during COVID-19. Knowledge Management Research & Practice
21
(6), 1137–1149.
Albl-Mikasa, M. (2010). Global English and English as a lingua franca (ELF): Implications for the interpreting profession. Trans-Kom
3
(2), 126–148.
Alexander, M. P., Benson, D. F. & Stuss, D. T. (1989). Frontal lobes and language. Brain and Language
37
(4), 656–691.
Amankwah-Amoah, J., Khan, Z., Wood, G. & Knight, G. (2021). COVID-19 and digitalization: The great acceleration. Journal of Business Research
136
1, 602–611.
Baranowska, K. (2020). Learning most with least effort: Subtitles and cognitive load. ELT Journal
74
(2), 105–115.
Boos, M., Kobi, M., Elmer, S. & Jäncke, L. (2022). The influence of experience on cognitive load during simultaneous interpretation. Brain and Language
234
1, 105185.
Castro-Meneses, L. J., Kruger, J.-L. & Doherty, S. (2020). Validating theta power as an objective measure of cognitive load in educational video. Educational Technology Research and Development
68
(1), 181–202.
Chen, S. (2017). The construct of cognitive load in interpreting and its measurement. Perspectives
25
(4), 640–657.
Cheung, A. K. F. (2008). Simultaneous interpreting of numbers: An experimental study. Forum
6
(2), 23–38.
Cheung, A. K. F. & Li, T. (2022). Machine aided interpreting: An experiment of automatic speech recognition in simultaneous interpreting. Translation Quarterly
104
(2), 1–20.
Chiu, C.-C., Sainath, T. N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R. J., Rao, K., Gonina, E., Jaitly, N., Li, B., Chorowski, J. & Bacchiani, M. (2018). State-of-the-art speech recognition with sequence-to-sequence models. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4774–4778.
Chmiel, A., Janikowski, P. & Lijewska, A. (2020). Multimodal processing in simultaneous interpreting with text: Interpreters focus more on the visual than the auditory modality. Target
32
(1), 37–58.
CSIS. (2017, 9 September). Raila Odinga on the Kenyan elections. [URL]
D’Ausilio, A., Craighero, L. & Fadiga, L. (2012). The contribution of the frontal lobe to the perception of speech. Journal of Neurolinguistics
25
(5), 328–335.
Defrancq, B. & Fantinuoli, C. (2021). Automatic speech recognition in the booth: Assessment of system performance, interpreters’ performances and interactions in the context of numbers. Target
33
(1), 73–102.
Dehaene, S., Cohen, L., Sigman, M. & Vinckier, F. (2005). The neural code for written words: A proposal. Trends in Cognitive Sciences
9
(7), 335–341.
Delorme, A. & Makeig, S. (2004). EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods
134
(1), 9–21.
Desmet, B., Vandierendonck, M. & Defrancq, B. (2018). Simultaneous interpretation of numbers and the impact of technological support. In C. Fantinuoli (Ed.), Interpreting and technology. Berlin: Language Science Press, 13–27.
Díaz-Cintas, J. (2020). The name and nature of subtitling. In Ł. Bogucki & M. Deckert (Ed.), The Palgrave handbook of audiovisual translation and media accessibility. Cham: Palgrave Macmillan, 149–171.
ritella, F. M. (2021). CAI tool-supported SI of numbers: A theoretical and methodological contribution. International Journal of Interpreter Education
14
(1), 32–56.
Fujimoto, M. & Kawai, H. (2019). One-pass single-channel noisy speech recognition using a combination of noisy and enhanced features. Interspeech 2019, 486–490.
Gevins, A. & Smith, M. E. (2003). Neurophysiological measures of cognitive workload during human-computer interaction. Theoretical Issues in Ergonomics Science
4
(1–2), 113–131.
Gile, D. (1999). Testing the Effort Model’s tightrope hypothesis in simultaneous interpreting — A contribution. Hermes
12
(23), 153–172.
(2009). Basic concepts and models for interpreter and translator training (revised edition). Amsterdam: John Benjamins.
Grabner, R. H., Brunner, C., Leeb, R., Neuper, C. & Pfurtscheller, G. (2007). Event-related EEG theta and alpha band oscillatory responses during language translation. Brain Research Bulletin
72
(1), 57–65.
Hart, S. G. (2006). Nasa-Task Load Index (NASA-TLX); 20 years later. Proceedings of the Human Factors and Ergonomics Society Annual Meeting
50
(9), 904–908.
Johnson, E. B., Rees, E. M., Labuschagne, I., Durr, A., Leavitt, B. R., Roos, R. A. C., Reilmann, R., Johnson, H., Hobbs, N. Z., Langbehn, D. R., Stout, J. C., Tabrizi, S. J. & Scahill, R. I. (2015). The impact of occipital lobe cortical thickness on cognitive task performance: An investigation in Huntington’s Disease. Neuropsychologia
79
1, 138–146.
Kafle, S. & Huenerfauth, M. (2016). Effect of speech recognition errors on text understandability for people who are deaf or hard of hearing. The 7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2016), 20–25.
Kalina, S. (1992). Discourse processing and interpreting strategies — An approach to the teaching of interpreting. In C. Dollerup & A. Loddegaard (Ed.), Teaching translation and interpreting. Amsterdam: John Benjamins, 251–257.
Klimesch, W., Schack, B. & Sauseng, P. (2005). The functional significance of theta and upper alpha oscillations. Experimental Psychology
52
(2), 99–108.
Lee, S.-B. (2018). Exploring a relationship between students’ interpreting self-efficacy and performance: Triangulating data on interpreter performance assessment. The Interpreter and Translator Trainer
12
(2), 166–187.
Lemhöfer, K. & Broersma, M. (2012). Introducing LexTALE: A quick and valid Lexical Test for Advanced Learners of English. Behavior Research Methods
44
(2), 325–343.
Lenzo, K. (1993, 16 September). The CMU pronouncing dictionary. [URL]
Liao, S., Kruger, J.-L. & Doherty, S. (2020). The impact of monolingual and bilingual subtitles on visual attention, cognitive load, and comprehension. The Journal of Specialised Translation
33
(1), 70–98.
Lin, X. (2013). An empirical study on computer aided interpretation from English to Chinese Master’s thesis, Shandong Normal University. [URL]
Locke, E. A., Frederick, E., Lee, C. & Bobko, P. (1984). Effect of self-efficacy, goals, and task strategies on task performance. Journal of Applied Psychology
69
(2), 241–251.
Ludersdorfer, P., Kronbichler, M. & Wimmer, H. (2015). Accessing orthographic representations from speech: The role of left ventral occipitotemporal cortex in spelling. Human Brain Mapping
36
(4), 1393–1406.
Mackintosh, J. (2003). The AIIC workload study. Forum
1
(2), 189–214.
Malakul, S. & Park, I. (2023). The effects of using an auto-subtitle system in educational videos to facilitate learning for secondary school students: Learning comprehension, cognitive load, and satisfaction. Smart Learning Environments
10
(1), 4.
Mellinger, C. D. & Hanson, T. A. (2024, 15 June). Cognitive load scales in CTIS: A systematic review. The Third Meeting of the Bertinoro Translation Society (BTS3), Bertinoro, Italy.
Mognon, A., Jovicich, J., Bruzzone, L. & Buiatti, M. (2011). ADJUST: An automatic EEG artifact detector based on the joint use of spatial and temporal features. Psychophysiology
48
(2), 229–240.
Nacimiento-García, E., González-González, C. S. & Gutiérrez-Vela, F. L. (2023). Automatic captions on video calls: A must for the older adults. Universal Access in the Information Society.
Nomura, S., Mizuno, T., Nozawa, A., Asano, H. & Ide, H. (2009). Salivary cortisol as a new biomarker for a mild mental workload. 2009 International Conference on Biometrics and Kansei Engineering, 127–131.
Orken, M., Dina, O., Keylan, A., Tolganay, T. & Mohamed, O. (2022). A study of transformer-based end-to-end speech recognition system for Kazakh language. Scientific Reports
12
(1), 8337.
O’Sullivan, C. & Cornu, J.-F. (2018). History of audiovisual translation. In L. Pérez-González (Ed.), The Routledge handbook of audiovisual translation (pp. 15–30). Abingdon: Routledge.
Paas, F. G. W. C. (1992). Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive-load approach. Journal of Educational Psychology
84
(4), 429–434.
Pisani, E. & Fantinuoli, C. (2021). Measuring the impact of automatic speech recognition on number rendition in simultaneous interpreting. In B. Zheng & C. Wang (Ed.), Empirical studies of translation and interpreting: The post-structuralist approach. Abingdon: Routledge, 181–197.
Prandi, B. (2018). An exploratory study on CAI tools in simultaneous interpreting: Theoretical framework and stimulus validation. In C. Fantinuoli (Ed.), Interpreting and technology. Berlin: Language Science Press, 29–59.
Puma, S., Matton, N., Paubel, P.-V., Raufaste, É. & El-Yagoubi, R. (2018). Using theta and alpha band power to assess cognitive workload in multitasking environments. International Journal of Psychophysiology
123
1, 111–120.
R Core Team. (2020). R: A language and environment for statistical computing. [URL]
Rinne, J. O., Tommola, J., Laine, M., Krause, B. J., Schmidt, D., Kaasinen, V., Teräs, M., Sipilä, H. & Sunnari, M. (2000). The translating brain: Cerebral activation patterns during simultaneous interpreting. Neuroscience Letters
294
(2), 85–88.
Romero-Fresco, P. & Eugeni, C. (2020). Live subtitling through respeaking. In Ł. Bogucki & M. Deckert (Ed.), The Palgrave handbook of audiovisual translation and media accessibility. Cham: Palgrave Macmillan, 269–295.
Scott, B. (2003). Automatic readability checker. Readability Formulas. [URL]
Seeber, K. G. (2011). Cognitive load in simultaneous interpreting: Existing theories — new models. Interpreting
13
(2), 176–204.
(2017). Multimodal processing in simultaneous interpreting. In J. W. Schwieter & A. Ferreira (Ed.), The handbook of translation and cognition. Hoboken, NJ: Wiley, 461–475.
Seeber, K. G., Keller, L. & Hervais-Adelman, A. (2020). When the ear leads the eye — the use of text during simultaneous interpretation. Language, Cognition and Neuroscience
35
(10), 1480–1494.
Setton, R. (1999). Simultaneous interpretation: A cognitive-pragmatic analysis. Amsterdam: John Benjamins.
Stone, J. V. (2002). Independent component analysis: An introduction. Trends in Cognitive Sciences
6
(2), 59–64.
Sun, H., Li, K. & Lu, J. (2021). AI-assisted simultaneous interpreting: An experiment and its implications. Technology Enhanced Foreign Language Education
06
1, 75–80+86+12.
Szarkowska, A. & Gerber-Morón, O. (2019). Two or three lines: A mixed-methods study on subtitle processing and preferences. Perspectives
27
(1), 144–164.
The BBC Academy. (2022, July). BBC subtitle guidelines. [URL]
Tran, Y., Craig, A., Craig, R., Chai, R. & Nguyen, H. (2020). The influence of mental fatigue on brain activity: Evidence from a systematic review with meta-analyses. Psychophysiology
57
(5).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. & Polosukhin, I. (2017). Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010.
Wang, C., Wu, Y., Lu, L., Liu, S., Li, J., Ye, G. & Zhou, M. (2020). Low latency end-to-end streaming speech recognition with a scout network. Interspeech 2020, 2112–2116.
Weiss, S. & Mueller, H. M. (2003). The contribution of EEG coherence to the investigation of language. Brain and Language
85
(2), 325–343.
Williams, N. S., McArthur, G. M., de Wit, B., Ibrahim, G. & Badcock, N. A. (2020). A validation of Emotiv EPOC Flex saline for EEG and ERP research. PeerJ
8
1, e9713.
Yuan, L. & Wang, B. (2023). Cognitive processing of the extra visual layer of live captioning in simultaneous interpreting: Triangulation of eye-tracked process and performance data. Ampersand
11
1, 100131.
Zekveld, A. A., Kramer, S. E., Kessens, J. M., Vlaming, M. S. M. G. & Houtgast, T. (2009). The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system. Ear & Hearing
30
(2), 262–272.