Part of
Interpreting Technologies – Current and Future Trends
Edited by Gloria Corpas Pastor and Bart Defrancq
[IVITRA Research in Linguistics and Literature 37] 2023
► pp. 169194
Aarabl, Parham
2003 “The fusion of distributed microphone arrays for sound localization”. EURASIP Journal on Advances in Signal Processing 2003 (4): 338–347. Google Scholar
Abowd, Gregory D.
1999 “Classroom 2000: An experiment with the instrumentation of a living educational environment”. IBM Systems Journal 38 (4): 508–530. DOI logoGoogle Scholar
Anidjar, Or Haim, Hajaj, Chen, Dvit, Amit, and Issachar Gilad
2020 “A thousand words are worth more than one recording: NLP based speaker change point detection”. [Online] Available at arXiv:2006.01206v1.
Apostolidis, Evlampios, Adamantidou, Elemi, Metsai, Alexandros I., Mezaris, Vasileios, and Ioannis Patras
2021 “Video Summarization Using Deep Neural Networks: A Survey”. Proceedings of the IEEE. [Online] Available at arXiv:2101.06072. DOI logo
Aronowitz, Hagai, Zhu, Weizhong, Suzuki, Masayuki, Kurata, Gakuto, and Ron Hoory
2020 “New advances in speaker diarisation”. INTERSPEECH 2020. 279–283.Google Scholar
Bazzi, Issam, and James R. Glass
2000 “Modeling out-of-vocabulary words for robust speech recognition”. In Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP 2000). 401–404. DOI logo
Besle, Julien, Fort, Alexandra, Delpuech, Claude, and Marie-Hélène Giard
2004 “Bimodal speech: Early suppressive visual effects in human auditory cortex”. European Journal of Neuroscience 20: 2225–2234. DOI logoGoogle Scholar
Braun, Sabine
2015 “Remote Interpreting”. In The Routledge Handbook of Interpreting, edited by Holly Mikkelson, and Renée Jourdenais, 352–367. New York: Routledge.Google Scholar
2020 “’You are just a disembodied voice really’. Perceptions of video remote interpreting by legal interpreters and police officers”. In Linking up with video: Perspectives on interpreting practice and research, edited by Heidi Salaets and Geert Brône, 203–233. Amsterdam: John Benjamins. DOI logoGoogle Scholar
Burger, Susanne, MacLaren, Victoria, and Hua Yu
2002 “The ISL meeting corpus: the impact of meeting type on speech style”. In: Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002). 301–304.
Chiu, Patrick, Boreczky, John, Girgensohn, Andreas, and Don Kimber
2001 “LiteMinutes: an Internet-based system for multimedia meeting minutes”. In Proceedings of the 10th international conference on World Wide Web (WWW2001). 140–149. Hong Kong, CN. DOI logo
Choe, Sang Keun, Lu, Quanyang, Raunak, Vikas, Xu, Yi, and Florian Metze
2019 “On Leveraging Visual Modality for Speech Recognition Error Correction”. Proceedings ICML 2019.
Clark, Herbert H., and Thomas B. Carlson
1982 “Hearers and speech acts”. Language 58 (2): 332–373. DOI logoGoogle Scholar
Coen, Michael H.
1999 “The future of human-computer interaction, or how I learned to stop worrying and love my intelligent room”. IEEE Intelligent Systems 14 (5): 8–10.Google Scholar
Constable, Andrew
2015Distance Interpreting: A Nuremberg Moment for our Time. AIIC 2015 Assembly Day 3: Debate on Remote.Google Scholar
Corpas Pastor, Gloria
2021 “Interpreting and Technology: Is the Sky Really the Limit?”. In Proceedings of the Translation and Interpreting Technology Online Conference, edited by Ruslan Mitkov, Vilelmini Sosoni, Julie Christine Giguere, Elena Murgolo, and Elisabeth Deysel, 15–24. Shumen: Incoma.
2022a “Interpreting tomorrow? How to build a computer-assisted glossary of phraseological units in (almost) no time”. In Computational and Corpus-Based Phraseology Fourth International Conference, Europhras 2022, Malaga, Spain, September 28–30, 2022, Proceedings, edited by Gloria Corpas Pastor and Ruslan Mitkov, 62–77. Berlin: Springer. DOI logo
2022b “Technology Solutions for Interpreters: The VIP System”. Hermēneus. Revista de Traducción e Interpretación 23: 91–123. DOI logoGoogle Scholar
Corpas Pastor, Gloria, and Lily May Fern
2016A Survey of Interpreters’ Needs and Practices Related to Language Technology. Technical report. Malaga: University of Malaga.Google Scholar
Cutler, Ross, Rui, Yong, Gupta, Anoop, Cadiz, J. J., Tashev, Ivan, He, Li-wei, Colburn, Alex, Zhang, Zhenyou, Liu, Zicheng, and Steve Silverberg
2002 “Distributed meetings: A meeting capture and broadcasting system”. In Proceedings of the tenth ACM international conference on Multimedia. 503–512. DOI logo
Davitti, Elena
2019 “Methodological explorations of interpreter-mediated interaction: novel insights from multimodal analysis. Qualitative Research”. Qualitative Research 19 (1): 7–29. DOI logoGoogle Scholar
Defrancq, Bart, and Claudio Fantinuoli
Fantinuoli, Claudio
2017 “Computer-Assisted Preparation in Conference Interpreting”. Translation & Interpreting 9 (2): 24–37. DOI logoGoogle Scholar
Foote, Jonathan T., Young, Steve J., Jones, Gareth J.F., and Karen Spärk Jones
1997 “Unconstrained keyword spotting using phone lattices with application to spoken document retrieval”. Computer Speech & Language 11 (3): 207–224. DOI logoGoogle Scholar
Goodwin, Charles
1981Conversational organization: Interaction between speakers and hearers. San Diego: Academic Press.Google Scholar
Gupta, Anhinav, Miao, Yajie, Neves, Leonardo, and Florian Metze
2017 “Visual features for context-aware speech recognition”. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE 2017, 5020–5024 [URL]. DOI logo
James, David A.
1995The application of classical information retrieval techniques to spoken documents. Unpublished doctoral thesis. University of Cambridge, United Kingdom.
James, David A., and Steven J. Young
1994 “A fast lattice-based approach to vocabulary independent wordspotting”. In Processing of IEEE International Conference on Acoustics, Speech, and Signal (ICASSP 1994). 377–381. DOI logo
Jewitt, Carey
2014 “An Introduction to Multimodality”. In The Routledge Handbook of Multimodal Analysis, edited by Carey Jewitt (2nd ed), 15–30. London: Routledge.Google Scholar
Jia, Jiyou
2015 “Intelligent Tutoring Systems”. In: Encyclopedia of Educational Technology, edited by Mike Spector, 411–413. Thousand Oaks, CA, USA: Sage.Google Scholar
Kazman, Rick, Al-Halimi, Reem, Hunt, William, and Marilyn Mantei
1996 “Four paradigms for indexing video conferences”. IEEE multimedia 3 (1): 63–73. DOI logoGoogle Scholar
Knapp, Mark L., Hall, Judith A., and Terrence Horgan
2013Nonverbal Communication in Human Interaction. Boston: Wadsworth Publishing.Google Scholar
Koehn, Philip
2010Statistical Machine Translation. Cambridge: Cambridge University Press.Google Scholar
Kubala, Francis, Colbath, Sean, Liu, Daben, and John Makhoul
1999 “Rough‘n’Ready: a meeting recorder and browser”. ACM Computing Surveys (CSUR) 31(2es): 7. DOI logoGoogle Scholar
Lee, Dar-Shyang, Erol, Berna, Graham, Jamey, Hull, Jonathan J., and Norihiko Murata
2002 “Portable meeting recorder”. In Proceedings of the 10th ACM International Conference on Multimedia. 493–502. DOI logo
Li, Haopeng, Ke, Qiuhong, Gong, Mingming, and Rui Zhang
2022 “Video Summarization Based on Video-text Modelling”. [Online] Available at arXiv:2201.02494
Li, Jinyui
2021 “Recent Advances in End-to-End Automatic Speech Recognition”. APSIPA Transactions on Signal and Information Processing. [Online] Available at arXiv:2111.01690.
Li, Ya, Campbell, Nick, and Jianhua Tao, J.
2015 “Voice quality: not only about ‘you’ but also about ‘your interlocutor’”. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal (ICASSP 2015). 4739–4743.
Lin, Zhejie, Zhao, Zhou, Li, Haoyuan, Liu, Jinglin, Zhang, Meng, Zeng, Xingshan, and Xiafei He
2021 “SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory”. Proceedings of ACMMM 2021. [Online] Available at [URL]. DOI logo
Luhn, Hans Peter
1958 “The automatic creation of literature abstracts”. IBM Journal of research and development 2 (2): 159–165. DOI logoGoogle Scholar
Macháček, Dominik, Žilinec, Matúš, and Ondřej Bojar
2021 “Lost in Interpreting: Speech Translation from Source or Interpreter?” In Proceedings of INTERSPEECH 2021 30 August3 September 2021, Brno, Check Republic. Brno: ISCA. 2376–238. DOI logoGoogle Scholar
Martínez, Aleix M.
2002 “Recognizing Imprecisely Localized: partially occluded and expression variant faces from a single sample per class”. IEEE Transaction on Pattern Analysis and Machine Intelligence 24 (6): 748–763. DOI logoGoogle Scholar
Matusov, Evgeny, Wilken, Patrick, Bahar, Parnia, Schamper, Julian, Golik, Pavel, Zeyer, Albert, Silvestre-Cerdà, Joan Albert, Martínez-Villaronga, Adrià, Pesch, Hendrick, and Jan-Thorsten Peter
2018 “Neural Speech Translation at AppTek”. In Proceedings of the 15th International Conference on Spoken Language Translation. Brussels. International Conference on Spoken Language Translation, 104–111. [Online] Available at [URL]
Mazzawi, Hanna, Gonzalvo, Xavi, Kracun, Alexandar, Sridhar, Prashant, Subrahmanya, Niranjan A., Lopez-Moreno, Ignacio, Park, Hyun-jin, and Patrick Violette
2019 “Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale”. Proceedings of INTERSPEECH 15. DOI logoGoogle Scholar
McCowan, Iain, Gatica-Perez, Daniel, Bengio, Samy, Lathoud, Gillaume, Barnard, Mark, and Dong Zhang
2005 “Automatic Analysis of Multimodal Group Actions in Meetings”. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (3): 305–317. DOI logoGoogle Scholar
Metze, Florian, Gieselman, Petra, Holzapfel, Hartwig, Kluge, Tobias, Rogina, Ivica, Waibel, Alex, and Mattias Wölfel
2006 “The ‘FAME’ Interactive Space”. In Proceedings of Machine Learning for Multimodal Interaction (MLMI2006). 285–296. DOI logoGoogle Scholar
Mondada, Lorenza
2016 “Challenges of multimodality: Language and the body in social interaction”. Journal of Sociolinguistics 20 (3): 336–366. DOI logoGoogle Scholar
Moores, Zoe
2020 “Fostering access for all through respeaking at live events”. JOsTrans. The journal of specialised translation 33: 176–211.Google Scholar
Morgan, Nathaniel, Baron, Don, Bhagat, Sonali, Carvey-Essenburg, Hannah, Dhillon, Rajdip, Edwards, Jane, Gelbart, David, Janin, Adam, Krupski, Ashley, Peskin, Barbara, Pfau, Thilo, Shriberg, Elizabeth, Stolcke, Andreas, and Chuck Wooters
2003 “Meetings about meetings: research at ICSI on speech in multiparty conversations”. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2003). 740–743.
Moser-Mercer, Barbara
2005 “Remote Interpreting: Issues of Multi-Sensory Integration in a Multilingual Task”. Meta 50 (2): 727–738. DOI logoGoogle Scholar
Müller, Cornelia, Cienki, Alan, Fricke, Ellen, Ladewig, Silva, McNeill, David, and Sedihna Tessendorf
(eds.) 2013Body – Language – Communication: An International Handbook on Multimodality in Human Interaction. Vol. 1. Berlin and Boston: De Gruyter Mouton.Google Scholar
(eds.) 2014Body – Language – Communication: An International Handbook on Multimodality in Human Interaction. Vol. 2. Berlin and Boston: De Gruyter Mouton.Google Scholar
Ng, Kenney, and Victor W. Zue
2000 “Subword-based approaches for spoken document retrieval”. Speech Communication 32 (3): 157–186. DOI logoGoogle Scholar
Oviatt, Sharon, Schuller, Björn, Cohen, Philip R., Sonntag, Daniel, Potamianos, Gerasimos, and Antonio Krüger
(eds.) 2017The Handbook of Multimodal-Multisensor Interfaces, Volume 1: Foundations, User Modeling, and Common Modality Combinations. ACM Books. DOI logoGoogle Scholar
Padois, Thomas, Sgard, Frack C., Doutres, Olivier, and Alain Berry
(2017) “Acoustic source localization using a polyhedral microphone array and an improved generalized cross-correlation technique”. Journal of Sound and Vibration 386: 82–99. DOI logoGoogle Scholar
Park, Tae Jin, Kanda, Naoyuki, Dimitriadis, Dimitrios, Han, Kyu J., Watanabe, Shinji, and Shrikanth Narayanan
2022 “A Review of Speaker Diarization: Recent Advances with Deep Learning”. Computer Speech & Language 72: 101317. DOI logoGoogle Scholar
Pentland, Alex, and Tracy Heibeck
2008Honest signals: how they shape our world. Cambridge: MIT press. DOI logoGoogle Scholar
Pöchhacker, Franz
2016Introducing Interpreting Studies. Routledge (2nd edition). London and New York: Routledge. DOI logoGoogle Scholar
2020 “ ‘Going Video’: Mediality and Multimodality in Interpreting Studies”. In Linking up with video: Perspectives on interpreting practice and research, edited by Salaets, H. and Brône, 13–45. Amsterdam: John Benjamins. DOI logoGoogle Scholar
Qu, Leyuan, Weber, Cornelius, and Stefan Wermter
2020 “Multimodal Target Speech Separation with Voice and Face References”. Interspeech, 2020. DOI logo
Ramanathan, Vignesh, Joulin, Armand, Liang, Percy, and Li Fei-Fei
2014 “Linking people in videos with “their” names using coreference resolution”. In Proceedings of the 13th European conference on computer vision (ECCV), Springer, 95–110. DOI logo
Rogina, Ivica, and Thomas Schaaf
2002 “Lecture and presentation tracking in an intelligent meeting room”. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces. 47–52. DOI logo
Romero-Fresco, Pablo
2011Subtitling Through Speech Recognition: Respeaking. (Translation Practices Explained). St Jerome Publishing.Google Scholar
2018 “Subtitling through speech recognition”. In The Routledge Handbook of Audiovisual Translation, edited by Luis Pérez-González, 96–113. London and New York: Routledge. DOI logoGoogle Scholar
Rui, Yong, Gupta, Anoop, and Jonathan Grudin
2003 “Videography for telepresentations”. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 457–464. DOI logo
Sandrelli, Annalisa
2020 “Interlingual respeaking and simultaneous interpreting in a conference setting: a comparison”. inTRAlinea Special Issue: Technology in Interpreter Education and Practice. [URL]Google Scholar
Schapire, Robert E.
2013 “Explaining adaboost”. In Empirical inference, 37–52. Berlin: Springer. DOI logoGoogle Scholar
Sinclair, Mark
2016Speech segmentation and speaker diarisation for transcription and translation. PhD thesis. University of Edinburgh, United Kingdom.
Spärck-Jones, Karen
1999 “Automatic summarizing: factors and directions”. In Advances in Automatic Text Summarization, 1–12. Cambridge: MIT Press.Google Scholar
Specia, Lucia, Wang, Josiah, Lee, Sun Jae, Ostapenko, Alissa, and Pranava Madhyastha
2021 “Read, spot and translate”. Machine Translation 35: 145–165. DOI logoGoogle Scholar
Stolbov Mikhail
2015 “Application of microphone arrays for distant speech capture”. Scientific and Technical Journal of Information Technologies, Mechanics and Optics 15 (4): 661–675. DOI logoGoogle Scholar
Sulubacak, Umut, Čağlayan, Ozan, Grönroos, Stig-Arne, Elliott, Desmond, Rouhe, Aku, Specia, Lucia, and Jörg Tiedemann
2020 “Multimodal machine translation through visuals and speech”. Machine Translation 34: 97–147. DOI logoGoogle Scholar
Tür, Gokhan, Stolcke, Andreas, Voss, Lynn, Peters, Stanley, Hakkani-Tür, Dilek, Dowding, Joh, Favre, Benoit, Fernández, Raquel, Frampton, Matthew, Frandsen, Michael W., Frederickson, Clint, Graciarena, Martin, Kintzing, Donald, Leveque, Kyle, Mason, Shane, Niekrasz, John, Purver, Matthew, Riedhammer, Korbinian, Shriberg, Elizabeth, Tien, Jing, Vergyri, Dimitra, and Fang Yang
2010 “The CALO meeting assistant system”. IEEE Transactions on Audio, Speech, and Language Processing 18 (6): 1601–1611. DOI logoGoogle Scholar
Vranjes, Jelena, and Geert Brône
2020 “Eye-tracking in interpreter-mediated talk: From research to practice”. In Linking up with video: Perspectives on interpreting practice and research, edited by Salaets, H. and Brône, G. 203–233. Amsterdam: John Benjamins. DOI logoGoogle Scholar
Wactlar, Howard D., Kanade, Takeo, Smith, Michael A., and Scott M. Stevens
1996 “Intelligent access to digital video: Informedia project”. Computer 29 (5): 46–52. DOI logoGoogle Scholar
Wadensjö, Cecilia
1999 “Telephone interpreting and the synchronization of talk in social interaction”. The Translator 5 (2): 247–264. DOI logoGoogle Scholar
Waibel, Alex, Bett, Michael, Metze, Florian, Ries, Klaus, Schaaf, Thomas, Schultz, Tanja, Soltau, Hagen, Yu, Hua, and Klaus Zechner
2001 “Advances in automatic meeting record creation and access”. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001). 597–600. DOI logo
Waibel, Alex, and Rainer Stiefelhagen
(eds.) 2009Computers in the Human Interaction Loop. London: Springer. DOI logoGoogle Scholar
Whittaker, Steve, Hyland, Patrick, and Myrtle Wiley
1994 “FILOCHAT: Handwritten notes provide access to recorded conversations”. In Proceedings of the SIGCHI conference on Human factors in computing systems. 271–277.
Witten, Ian H., Moffat, Alistair, and Timothy C. Bell
1999Managing Gigabytes: Compressing and Indexing Documents and Images. San Francisco: Morgan Kaufmann.Google Scholar
Zhang, Xiaojun
2015 “The Changing Face of Conference Interpreting”. In New Horizons in Translation and Interpreting Studies. The 7th International Conference of the Iberian Association of Translation and Interpreting Studies (AIETI) edited by Gloria Corpas Pastor, Míriam Seghiri Domínguez, Rut Gutiérrez Florido and Míriam Urbano Mendaña, 255–263. Geneva: Editions Tradulex. [Online] Available at [URL]
Zhao, Wen-Yi, Chellappa, Rama, Phillips, Jonathon, and Azriel Rosenfeld
2003 “Face Recognition: A Literature Survey”. ACM computing surveys (CSUR) 35 (4): 399–458. DOI logoGoogle Scholar
Zhou, Bowen, Besacier, Laurent, and Yuqing Gao
2007 “On Efficient Coupling of ASR and SMT for Speech Translation”. 2007 IEEE International Conference on Acoustics, Speech and Signal Processing – ICASSP ’07, 2007. IV-101–IV-104 DOI logo
Zhu, Wenwu, Wang, Xin, and Honzhi Li
2020 “Multi-modal Deep Analysis for Multimedia”. IEEE Transactions on Circuits and Systems for Video Technology. [Online] Available at [URL]. DOI logo