Machine translation quality in an audiovisual context

Aljoscha Burchardt, Arle Lommel, Lindsay Bywood, Kim Harris and Maja Popović
DFKI, Berlin | University College London | text&form/DFKI, Berlin | Humboldt-Universität zu Berlin

The volume of Audiovisual Translation (AVT) is increasing to meet the rising demand for data that needs to be accessible around the world. Machine Translation (MT) is one of the most innovative technologies to be deployed in the field of translation, but it is still too early to predict how it can support the creativity and productivity of professional translators in the future. Currently, MT is more widely used in (non-AV) text translation than in AVT. In this article, we discuss MT technology and demonstrate why its use in AVT scenarios is particularly challenging. We also present some potentially useful methods and tools for measuring MT quality that have been developed primarily for text translation. The ultimate objective is to bridge the gap between the tech-savvy AVT community, on the one hand, and researchers and developers in the field of high-quality MT, on the other.

Table of contents

Audiovisual translation (AVT) has become a fundamental necessity in the 21st century. Media trends such as VHS and LaserDiscs have come and gone, and translation tools have progressed from typewriters to fully integrated real-time web-based translation environments. Our world is becoming ever smaller, while the demand for information in every corner of the globe is growing. Consequently, the sheer volume of data that needs to be accessible in most regions and languages of the world is rising dramatically: every minute, 300 hours of video material is being uploaded to YouTube. Even assuming that only a small fraction of this content is of interest to a broader global audience, the effort required to publish it in multiple languages is a tremendous challenge. This has been recognized and acknowledged by research bodies and governments that have supported early-adopter projects involving automatic AV translation. Such projects include MUSA and eTITLE, which have used rule-based MT combined with translation memory to investigate the potential of these tools for AVT; SUMAT, which has trained statistical machine translation engines on subtitles in seven bi-directional language pairs and performed an extensive evaluation of the resulting MT quality; EU-Bridge, which has focused on further advancing the state-of-the-art in automatic speech recognition combined with MT with a view to applying this technology in several domains, including AVT; HBB4ALL, which, although mainly focused on accessibility, has carried out research into the reception of automatic interlingual subtitles; and ALST, a project whose aim was to implement existing automatic speech recognition, speech synthesis and MT technologies in audio description and voice-over, part of which included quality assessment of voice-over scripts produced using MT and post-editing.

Full-text access is restricted to subscribers. Log in to obtain additional credentials. For subscription information see Subscription & Price. Direct PDF access to this article can be purchased through our e-platform.


Avramidis, Eleftherios, Aljoscha Burchardt, Christian, Federmann, Maja Popovićs, Cindy Tscherwinka, and David Vilar
2012 “Involving Language Professionals in the Evaluation of Machine Translation.” In Proceedings of LREC 2012, 1127–1130. http://​www​.lrec​-conf​.org​/proceedings​/lrec2012​/index​.html. Accessed December 12, 2015.
Banerjee, Satanjeev, and Alon Lavie
2005 “METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments.” In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, ed. by Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare Voss, 65–72. Michigan, MI: University of Michigan.Google Scholar
Bywood, Lindsay, Martin Volk, Mark Fishel, and Panayota Georgakopoulou
2013 “Parallel Subtitle Corpora and their Applications in Machine Translation and Translatology.” In Corpus Linguistics and AVT: in Search of an Integrated Approach, special issue of Perspectives: Studies in Translatology 21 (4): 1–16. DOI logoGoogle Scholar
Chaume, Frederic
2004Cine y traducción. Madrid: Cátedra.Google Scholar
De Sousa, Sheila C. M., Wilker Aziz, and Lucia Specia
2011 “Assessing the Post-Editing Effort for Automatic and Semi-Automatic Translations of DVD Subtitles.” In Proceedings of the International Conference on Recent Advances in Natural Language Processing, ed. by Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, and Nikolai Nikolov, 97–103. http://​www​.aclweb​.org​/anthology​/R11​-1014​.pdf. Accessed December 22, 2015.
Díaz-Cintas, Jorge, and Aline Remael
2007Audiovisual Translation, Subtitling. Manchester: St. Jerome.Google Scholar
Etchegoyhen, Thierry, Lindsay Bywood, Mark Fishel, Panayota Georgakopoulou, Jie Jiang, Gerard van Loenhout, Arantza del Pozo, Mirjam Sepesy Maucec, Anja Turner, and Martin Volk
2014 “Machine Translation for Subtitling: A Large-Scale Evaluation.” In Proceedings of LREC 2014, 46–53. http://​www​.lrec​-conf​.org​/proceedings​/lrec2014​/index​.html. Accessed December 22, 2015.
Lommel, Arle, Aljoscha Burchardt, and Hans Uszkoreit
2014 “Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics.” In Tradumàtica: tecnologies de la traducció 0 (12): 455–463. DOI logoGoogle Scholar
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu
2002 “BLEU: A Method for Automatic Evaluation of Machine Translation.” In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311–318. http://​dl​.acm​.org​/citation​.cfm​?id​=1073083​&picked​=prox. Accessed December 22, 2015.
Popović, Maja
2011a “Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output.” The Prague Bulletin of Mathematical Linguistics 96: 59–68. DOI logoGoogle Scholar
2011b “Morphemes and POS Tags for N-gram Based Evaluation Metrics.” In Proceedings of the Sixth Workshop on Statistical Machine Translation, 104–107. file://​​/Users​/SRP​/Downloads​/ngrams​.pdf. Accessed December 22, 2015.
Romero-Fresco, Pablo, and Juan Martínez Pérez
2015 “Accuracy Rate in Live Subtitling – the NER Model.” In Audiovisual Translation in a Global Context: Mapping an Ever-changing Landscape, ed. by Jorge Díaz Cintas, and Rocío Baños Pinero, 28–50. London: Palgrave Macmillan. http://​hdl​.handle​.net​/10142​/141892(draft). Accessed November 4, 2015. DOI logo
Rubin, Ann D.
1978 “A Theoretical Taxonomy of the Differences between Oral and Written Language.” Center for the Study of Reading Technical Report 35.Google Scholar
Shah, Kashif, Eleftherios Avramidis, Ergun Biçicic, and Lucia Specia
2013 “QuEst – Design, Implementation and Extensions of a Framework for Machine Translation Quality Estimation.” The Prague Bulletin of Mathematical Linguistics 100: 19–30. DOI logoGoogle Scholar
Vilar, David, Jia Xu, Luis Fernando d’Haro, and Hermann Ney
2006 “Error Analysis of Statistical Machine Translation Output.” In Proceedings of LREC 2006, 697–702. file://​​/Users​/SRP​/Downloads​/2lrec06​_errorAnalysis​.pdf. Accessed December 22, 2015.