References
Aarabl, Parham. 2003. “The
fusion of distributed microphone arrays for sound
localization”. EURASIP Journal on
Advances in Signal
Processing 2003 (4): 338–347.
Abowd, Gregory D. 1999. “Classroom
2000: An experiment with the instrumentation of a living educational
environment”. IBM Systems
Journal 38 (4): 508–530.
Anidjar, Or Haim, Hajaj, Chen, Dvit, Amit, and Issachar Gilad. 2020. “A
thousand words are worth more than one recording: NLP based speaker
change point
detection”. [Online] Available
at arXiv:2006.01206v1.
Apostolidis, Evlampios, Adamantidou, Elemi, Metsai, Alexandros I., Mezaris, Vasileios, and Ioannis Patras. 2021. “Video
Summarization Using Deep Neural Networks: A
Survey”. Proceedings of the
IEEE. [Online] Available
at arXiv:2101.06072.
Aronowitz, Hagai, Zhu, Weizhong, Suzuki, Masayuki, Kurata, Gakuto, and Ron Hoory. 2020. “New
advances in speaker
diarisation”. INTERSPEECH 2020. 279–283.
Bazzi, Issam, and James R. Glass. 2000. “Modeling
out-of-vocabulary words for robust speech
recognition”. In Proceedings
of the 6th International Conference on Spoken Language
Processing (ICSLP
2000). 401–404.
Besle, Julien, Fort, Alexandra, Delpuech, Claude, and Marie-Hélène Giard. 2004. “Bimodal
speech: Early suppressive visual effects in human auditory
cortex”. European Journal of
Neuroscience 20: 2225–2234.
Braun, Sabine. 2015. “Remote
Interpreting”. In The
Routledge Handbook of Interpreting, edited
by Holly Mikkelson, and Renée Jourdenais, 352–367. New York: Routledge.
Burger, Susanne, MacLaren, Victoria, and Hua Yu. 2002. “The
ISL meeting corpus: the impact of meeting type on speech
style”. In: Proceedings
of the 7th International Conference on Spoken Language
Processing (ICSLP
2002). 301–304.
Chiu, Patrick, Boreczky, John, Girgensohn, Andreas, and Don Kimber. 2001. “LiteMinutes:
an Internet-based system for multimedia meeting
minutes”. In Proceedings
of the 10th international conference on World Wide
Web (WWW2001). 140–149. Hong Kong, CN.
Choe, Sang Keun, Lu, Quanyang, Raunak, Vikas, Xu, Yi, and Florian Metze. 2019. “On
Leveraging Visual Modality for Speech Recognition Error
Correction”. Proceedings ICML
2019.
Clark, Herbert H., and Thomas B. Carlson. 1982. “Hearers
and speech
acts”. Language 58 (2): 332–373.
Coen, Michael H. 1999. “The
future of human-computer interaction, or how I learned to stop
worrying and love my intelligent
room”. IEEE Intelligent
Systems 14 (5): 8–10.
Constable, Andrew. 2015. Distance
Interpreting: A Nuremberg Moment for our
Time. AIIC 2015 Assembly Day 3: Debate on
Remote.
Corpas Pastor, Gloria. 2021. “Interpreting
and Technology: Is the Sky Really the
Limit?”. In Proceedings
of the Translation and Interpreting Technology Online
Conference, edited
by Ruslan Mitkov, Vilelmini Sosoni, Julie Christine Giguere, Elena Murgolo, and Elisabeth Deysel, 15–24. Shumen: Incoma.
Corpas Pastor, Gloria. 2022a. “Interpreting
tomorrow? How to build a computer-assisted glossary of
phraseological units in (almost) no
time”. In Computational
and Corpus-Based Phraseology Fourth International Conference,
Europhras 2022, Malaga,
Spain, September 28–30,
2022, Proceedings, edited
by Gloria Corpas Pastor and Ruslan Mitkov, 62–77. Berlin: Springer.
Corpas Pastor, Gloria. 2022b. “Technology
Solutions for Interpreters: The VIP
System”. Hermēneus. Revista de
Traducción e
Interpretación 23: 91–123.
Corpas Pastor, Gloria, and Lily May Fern. 2016. A
Survey of Interpreters’ Needs and Practices Related to Language
Technology. Technical
report. Malaga: University of Malaga.
Cutler, Ross, Rui, Yong, Gupta, Anoop, Cadiz, J. J., Tashev, Ivan, He, Li-wei, Colburn, Alex, Zhang, Zhenyou, Liu, Zicheng, and Steve Silverberg. 2002. “Distributed
meetings: A meeting capture and broadcasting
system”. In Proceedings
of the tenth ACM international conference on
Multimedia. 503–512.
Davitti, Elena. 2019. “Methodological
explorations of interpreter-mediated interaction: novel insights
from multimodal analysis. Qualitative
Research”. Qualitative
Research 19 (1): 7–29.
Fantinuoli, Claudio. 2017. “Computer-Assisted
Preparation in Conference
Interpreting”. Translation &
Interpreting 9 (2): 24–37.
Foote, Jonathan T., Young, Steve J., Jones, Gareth J.F., and Karen Spärk Jones. 1997. “Unconstrained
keyword spotting using phone lattices with application to spoken
document retrieval”. Computer Speech
&
Language 11 (3): 207–224.
Goodwin, Charles. 1981. Conversational
organization: Interaction between speakers and
hearers. San Diego: Academic Press.
Gupta, Anhinav, Miao, Yajie, Neves, Leonardo, and Florian Metze. 2017. “Visual
features for context-aware speech
recognition”. In Proceedings
of IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP). IEEE, 2017, 5020–5024 [URL].
James, David A. 1995. The
application of classical information retrieval techniques to spoken
documents. Unpublished doctoral thesis. University of
Cambridge, United Kingdom.
James, David A., and Steven J. Young. 1994. “A
fast lattice-based approach to vocabulary independent
wordspotting”. In Processing
of IEEE International Conference on Acoustics, Speech, and
Signal (ICASSP
1994). 377–381.
Jewitt, Carey. 2014. “An
Introduction to
Multimodality”. In The
Routledge Handbook of Multimodal
Analysis, edited
by Carey Jewitt (2nd
ed), 15–30. London: Routledge.
Jia, Jiyou. 2015. “Intelligent
Tutoring
Systems”. In: Encyclopedia
of Educational Technology, edited
by Mike Spector, 411–413. Thousand Oaks, CA, USA: Sage.
Kazman, Rick, Al-Halimi, Reem, Hunt, William, and Marilyn Mantei. 1996. “Four
paradigms for indexing video
conferences”. IEEE
multimedia 3 (1): 63–73.
Knapp, Mark L., Hall, Judith A., and Terrence Horgan. 2013. Nonverbal
Communication in Human
Interaction. Boston: Wadsworth Publishing.
Koehn, Philip. 2010. Statistical
Machine
Translation. Cambridge: Cambridge University Press.
Kubala, Francis, Colbath, Sean, Liu, Daben, and John Makhoul. 1999. “Rough‘n’Ready:
a meeting recorder and browser”. ACM
Computing Surveys
(CSUR) 31(2es): 7.
Lee, Dar-Shyang, Erol, Berna, Graham, Jamey, Hull, Jonathan J., and Norihiko Murata. 2002. “Portable
meeting
recorder”. In Proceedings
of the 10th ACM International Conference on
Multimedia. 493–502.
Li, Haopeng, Ke, Qiuhong, Gong, Mingming, and Rui Zhang. 2022. “Video
Summarization Based on Video-text
Modelling”. [Online] Available
at arXiv:2201.02494
Li, Jinyui. 2021. “Recent
Advances in End-to-End Automatic Speech
Recognition”. APSIPA Transactions on
Signal and Information
Processing. [Online] Available
at arXiv:2111.01690.
Li, Ya, Campbell, Nick, and Jianhua Tao, J. 2015. “Voice
quality: not only about ‘you’ but also about ‘your
interlocutor’”. In Proceedings
of IEEE International Conference on Acoustics, Speech, and
Signal (ICASSP
2015). 4739–4743.
Lin, Zhejie, Zhao, Zhou, Li, Haoyuan, Liu, Jinglin, Zhang, Meng, Zeng, Xingshan, and Xiafei He. 2021. “SimulLR:
Simultaneous Lip Reading Transducer with Attention-Guided Adaptive
Memory”. Proceedings of ACMMM
2021. [Online] Available
at [URL].
Luhn, Hans Peter. 1958. “The
automatic creation of literature
abstracts”. IBM Journal of research
and
development 2 (2): 159–165.
Macháček, Dominik, Žilinec, Matúš, and Ondřej Bojar. 2021. “Lost
in Interpreting: Speech Translation from Source or
Interpreter?” In Proceedings
of INTERSPEECH
2021. 30 August–3 September, 2021, Brno, Check Republic. Brno: ISCA. 2376–238.
Martínez, Aleix M. 2002. “Recognizing
Imprecisely Localized: partially occluded and expression variant
faces from a single sample per
class”. IEEE Transaction on Pattern
Analysis and Machine
Intelligence 24 (6): 748–763.
Matusov, Evgeny, Wilken, Patrick, Bahar, Parnia, Schamper, Julian, Golik, Pavel, Zeyer, Albert, Silvestre-Cerdà, Joan Albert, Martínez-Villaronga, Adrià, Pesch, Hendrick, and Jan-Thorsten Peter. 2018. “Neural
Speech Translation at
AppTek”. In Proceedings
of the 15th International Conference on Spoken Language
Translation. Brussels. International
Conference on Spoken Language
Translation, 104–111. [Online] Available
at [URL]
Mazzawi, Hanna, Gonzalvo, Xavi, Kracun, Alexandar, Sridhar, Prashant, Subrahmanya, Niranjan A., Lopez-Moreno, Ignacio, Park, Hyun-jin, and Patrick Violette. 2019. “Improving
Keyword Spotting and Language Identification via Neural Architecture
Search at
Scale”. Proceedings of INTERSPEECH 15.
McCowan, Iain, Gatica-Perez, Daniel, Bengio, Samy, Lathoud, Gillaume, Barnard, Mark, and Dong Zhang. 2005. “Automatic
Analysis of Multimodal Group Actions in
Meetings”. IEEE Transactions on
Pattern Analysis and Machine
Intelligence 27 (3): 305–317.
Metze, Florian, Gieselman, Petra, Holzapfel, Hartwig, Kluge, Tobias, Rogina, Ivica, Waibel, Alex, and Mattias Wölfel. 2006. “The
‘FAME’ Interactive
Space”. In Proceedings
of Machine Learning for Multimodal Interaction
(MLMI2006). 285–296.
Mondada, Lorenza. 2016. “Challenges
of multimodality: Language and the body in social
interaction”. Journal of
Sociolinguistics 20 (3): 336–366.
Moores, Zoe. 2020. “Fostering
access for all through respeaking at live
events”. JOsTrans. The journal of
specialised
translation 33: 176–211.
Morgan, Nathaniel, Baron, Don, Bhagat, Sonali, Carvey-Essenburg, Hannah, Dhillon, Rajdip, Edwards, Jane, Gelbart, David, Janin, Adam, Krupski, Ashley, Peskin, Barbara, Pfau, Thilo, Shriberg, Elizabeth, Stolcke, Andreas, and Chuck Wooters. 2003. “Meetings
about meetings: research at ICSI on speech in multiparty
conversations”. In Proceedings
of IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP2003). 740–743.
Moser-Mercer, Barbara. 2005. “Remote
Interpreting: Issues of Multi-Sensory Integration in a Multilingual
Task”. Meta 50 (2): 727–738.
Müller, Cornelia, Cienki, Alan, Fricke, Ellen, Ladewig, Silva, McNeill, David, and Sedihna Tessendorf (eds.). 2013. Body –
Language – Communication: An International Handbook on Multimodality
in Human Interaction. Vol.
1. Berlin and Boston: De Gruyter Mouton.
Müller, Cornelia, Cienki, Alan, Fricke, Ellen, Ladewig, Silva, McNeill, David, and Sedihna Tessendorf (eds.). 2014. Body –
Language – Communication: An International Handbook on Multimodality
in Human Interaction. Vol.
2. Berlin and Boston: De Gruyter Mouton.
Ng, Kenney, and Victor W. Zue, 2000. “Subword-based
approaches for spoken document
retrieval”. Speech
Communication 32 (3): 157–186.
Oviatt, Sharon, Schuller, Björn, Cohen, Philip R., Sonntag, Daniel, Potamianos, Gerasimos, and Antonio Krüger (eds.). 2017. The
Handbook of Multimodal-Multisensor Interfaces, Volume 1:
Foundations, User Modeling, and Common Modality
Combinations. ACM Books.
Padois, Thomas, Sgard, Frack C., Doutres, Olivier, and Alain Berry. (2017). “Acoustic
source localization using a polyhedral microphone array and an
improved generalized cross-correlation
technique”. Journal of Sound and
Vibration 386: 82–99.
Park, Tae Jin, Kanda, Naoyuki, Dimitriadis, Dimitrios, Han, Kyu J., Watanabe, Shinji, and Shrikanth Narayanan, 2022. “A
Review of Speaker Diarization: Recent Advances with Deep
Learning”. Computer Speech &
Language 72: 101317.
Pentland, Alex, and Tracy Heibeck. 2008. Honest
signals: how they shape our
world. Cambridge: MIT press.
Pöchhacker, Franz. 2016. Introducing
Interpreting
Studies. Routledge (2nd
edition). London and New York: Routledge.
Qu, Leyuan, Weber, Cornelius, and Stefan Wermter. 2020. “Multimodal
Target Speech Separation with Voice and Face
References”. Interspeech,
2020.
Ramanathan, Vignesh, Joulin, Armand, Liang, Percy, and Li Fei-Fei. 2014. “Linking
people in videos with “their” names using coreference
resolution”. In Proceedings
of the 13th European conference on computer
vision (ECCV), Springer, 95–110.
Rogina, Ivica, and Thomas Schaaf. 2002. “Lecture
and presentation tracking in an intelligent meeting
room”. In Proceedings
of the 4th IEEE International Conference on Multimodal
Interfaces. 47–52.
Romero-Fresco, Pablo. 2011. Subtitling
Through Speech Recognition:
Respeaking. (Translation Practices
Explained). St Jerome Publishing.
Romero-Fresco, Pablo. 2018. “Subtitling
through speech
recognition”. In The
Routledge Handbook of Audiovisual
Translation, edited
by Luis Pérez-González, 96–113. London and New York: Routledge.
Rui, Yong, Gupta, Anoop, and Jonathan Grudin. 2003. “Videography
for
telepresentations”. In Proceedings
of the SIGCHI Conference on Human Factors in Computing
Systems. 457–464.
Sandrelli, Annalisa. 2020. “Interlingual
respeaking and simultaneous interpreting in a conference setting: a
comparison”. inTRAlinea Special
Issue: Technology in Interpreter
Education and
Practice. [URL]
Schapire, Robert E. 2013. “Explaining
adaboost”. In Empirical
inference, 37–52. Berlin: Springer.
Sinclair, Mark. 2016. Speech
segmentation and speaker diarisation for transcription and
translation. PhD
thesis. University of
Edinburgh, United Kingdom.
Spärck-Jones, Karen. 1999. “Automatic
summarizing: factors and
directions”. In Advances
in Automatic Text
Summarization, 1–12. Cambridge: MIT Press.
Specia, Lucia, Wang, Josiah, Lee, Sun Jae, Ostapenko, Alissa, and Pranava Madhyastha. 2021. “Read,
spot and translate”. Machine
Translation 35: 145–165.
Stolbov Mikhail. 2015. “Application
of microphone arrays for distant speech
capture”. Scientific and Technical
Journal of Information Technologies, Mechanics and
Optics 15 (4): 661–675.
Sulubacak, Umut, Čağlayan, Ozan, Grönroos, Stig-Arne, Elliott, Desmond, Rouhe, Aku, Specia, Lucia, and Jörg Tiedemann. 2020. “Multimodal
machine translation through visuals and
speech”. Machine
Translation 34: 97–147.
Tür, Gokhan, Stolcke, Andreas, Voss, Lynn, Peters, Stanley, Hakkani-Tür, Dilek, Dowding, Joh, Favre, Benoit, Fernández, Raquel, Frampton, Matthew, Frandsen, Michael W., Frederickson, Clint, Graciarena, Martin, Kintzing, Donald, Leveque, Kyle, Mason, Shane, Niekrasz, John, Purver, Matthew, Riedhammer, Korbinian, Shriberg, Elizabeth, Tien, Jing, Vergyri, Dimitra, and Fang Yang. 2010. “The
CALO meeting assistant system”. IEEE
Transactions on Audio, Speech, and Language
Processing 18 (6): 1601–1611.
Wactlar, Howard D., Kanade, Takeo, Smith, Michael A., and Scott M. Stevens. 1996. “Intelligent
access to digital video: Informedia
project”. Computer 29 (5): 46–52.
Wadensjö, Cecilia. 1999. “Telephone
interpreting and the synchronization of talk in social
interaction”. The
Translator 5 (2): 247–264.
Waibel, Alex, Bett, Michael, Metze, Florian, Ries, Klaus, Schaaf, Thomas, Schultz, Tanja, Soltau, Hagen, Yu, Hua, and Klaus Zechner. 2001. “Advances
in automatic meeting record creation and
access”. In Proceedings
of IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP
2001). 597–600.
Waibel, Alex, and Rainer Stiefelhagen (eds.). 2009. Computers
in the Human Interaction
Loop. London: Springer.
Whittaker, Steve, Hyland, Patrick, and Myrtle Wiley. 1994. “FILOCHAT:
Handwritten notes provide access to recorded
conversations”. In Proceedings
of the SIGCHI conference on Human factors in computing
systems. 271–277.
Witten, Ian H., Moffat, Alistair, and Timothy C. Bell. 1999. Managing
Gigabytes: Compressing and Indexing Documents and
Images. San Francisco: Morgan Kaufmann.
Zhang, Xiaojun. 2015. “The
Changing Face of Conference
Interpreting”. In New
Horizons in Translation and Interpreting
Studies. The 7th International Conference
of the Iberian Association of Translation and Interpreting
Studies (AIETI) edited
by Gloria Corpas Pastor, Míriam Seghiri Domínguez, Rut Gutiérrez Florido and Míriam Urbano Mendaña, 255–263. Geneva: Editions Tradulex. [Online] Available
at [URL]
Zhao, Wen-Yi, Chellappa, Rama, Phillips, Jonathon, and Azriel Rosenfeld. 2003. “Face
Recognition: A Literature
Survey”. ACM computing surveys
(CSUR) 35 (4): 399–458.
Zhou, Bowen, Besacier, Laurent, and Yuqing Gao. 2007. “On
Efficient Coupling of ASR and SMT for Speech
Translation”. 2007 IEEE
International Conference on Acoustics, Speech and Signal
Processing – ICASSP
’07, 2007. IV-101–IV-104
Zhu, Wenwu, Wang, Xin, and Honzhi Li. 2020. “Multi-modal
Deep Analysis for Multimedia”. IEEE
Transactions on Circuits and Systems for Video
Technology. [Online] Available
at [URL].