Some Translation Studies informed suggestions for further balancing methodologies for machine translation quality evaluation

Krüger, Ralph

doi:10.1075/ts.21026.kru

Article published In:

Translation Spaces
Vol. 11:2 (2022) ► pp.213–233

Some Translation Studies informed suggestions for further balancing methodologies for machine translation quality evaluation

Ralph Krüger | TH Köln – University of Applied Sciences

This article intends to contribute to the current debate on the quality of neural machine translation (NMT) vs. (professional) human translation quality, where recently claims concerning (super)human performance of NMT systems have emerged. The article will critically analyse some current machine translation (MT) quality evaluation methodologies employed in studies claiming such performance of their MT systems. This analysis aims to identify areas where these methodologies are potentially biased in favour of MT and hence may overvalue MT performance while undervaluing human translation performance. Then, the article provides some Translation Studies informed suggestions for improving or debiasing these methodologies in order to arrive at a more balanced picture of MT vs. (professional) human translation quality.

Keywords: machine translation quality evaluation, professional human translation, (super)human MT performance, MT bias, translation studies

Article outline

1.Introduction
2.The need for properly balanced MT quality evaluation methodologies
3.The current debate on (super)human performance of NMT
- 3.1Google: Bridging the gap between human and machine translation
- 3.2Microsoft: Parity between professional human and machine translation
- 3.3Criticism of Microsoft’s evaluation methodology
- 3.4CUBBIT: Human translation is not the upper bound of translation quality
4.Suggestions for further balancing MT quality evaluation methodologies
- 4.1The quality of the human reference translations against which MT quality is to be measured
- 4.2The extent of translational context taken into consideration in the MT quality evaluation campaign
- 4.3Weighing translation errors according to their severity
- 4.4Integrating MT systems into high-quality translation settings in order to measure the added value of professional human translators
5.Areas where current NMT systems necessarily underperform compared to professional human translators
6.Conclusion
Acknowledgements
Notes
References

Published online: 18 March 2022

https://doi.org/10.1075/ts.21026.kru

References (36)

ELIS

2021 European Language Industry Survey. Accessed June 9, 2021. [URL]

ErgoTrans

2015 Final Report: Cognitive and Physical Ergonomics of Translation (ErgoTrans). Accessed June 24 2021. [URL]

Freitag, Markus, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, and Wolfgang Macherey

2021 “Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation.” arXiv. Accessed June 9, 2021. [URL].

Grice, Herbert P.

1975 “Logic and Conversation.” In Syntax and Semantics. Volume 31, edited by Peter Cole, and Jerry L. Morgan. 41–58. New York: Academic Press.

Hassan, Hany, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, and Ming Zhou

2018 “Achieving Human Parity on Automatic Chinese to English News Translation.” arXiv. Accessed June 9, 2021. [URL]

Horn-Helf, Brigitte

1999 Technisches Übersetzen in Theorie und Praxis. [The Theory and Practice of Technical Translation]. Tübingen/Basel: Francke.

House, Juliane

2006 “Communicative Styles in English and German.” European Journal of English Studies 10(3), 249–267.

Kade, Otto

1968 Zufall und Gesetzmäßigkeit in der Übersetzung [Coincidence and Regularities in Translation]. Leipzig: Verlag Enzyklopädie.

Koehn, Philipp

2020 Neural Machine Translation. Cambridge: University Press.

Krüger, Ralph

2015 The Interface between Scientific and Technical Translation Studies and Cognitive Linguistics. With Particular Emphasis on Explicitation and Implicitation as Indicators of Translational Text-Context Interaction. Berlin: Frank & Timme.

2016 “Situated LSP Translation from a Cognitive Translational Perspective.” Lebende Sprachen 61(2), 297–332.

2020 “Explicitation in Neural Machine Translation.” Across Languages and Cultures 21(2), 195–216.

Läubli, Samuel, Rico Sennrich, and Martin Volk

2018 “Has Machine Translation Achieved Human Parity? A Case for Document-Level Evaluation.” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, edited by Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii. 4791–4796. Association for Computational Linguistics. Accessed June 9, 2021.

Läubli, Samuel, Sheila Castilho, Graham Neubig, Rico Sennrich, Qinlan Shen, and Antonio Toral

2020 “A Set of Recommendations for Assessing Human-Machine Parity in Language Translation.” Journal of Artificial Intelligence Research 671, 653–672. Accessed June 9, 2021.

Lommel, Arle

2018 “Metrics for Translation Quality Assessment: A Case for Standardising Error Typologies.” In Translation Quality Assessment. From Principles to Practice, edited by Joss Moorkens, Sheila Castilho, Federico Gaspari, and Stephen Doherty. 109–127. Springer.

2020 “At Human Parity? A Skeptical Response to MT Quality Claims” In Maschinelle Übersetzung für Übersetzungsprofis, edited by Jörg Porsiel. 185–197. BDÜ Fachverlag.

Macken, Lieve, Daniel Prou, and Arda Tezcan

2020 “Quantifying the Effect of Machine Translation in a High-Quality Human Translation Production Process.” Informatics 7(2), 1–19. Accessed June 25, 2021. [URL]

Maruf, Sameen, Fahimeh Saleh, and Gholamreza Haffari

2021 A Survey on Document-Level Neural Machine Translation: Methods and Evaluation. ACM Computing Surveys 54(2), 1–36. Accessed November 1, 2021.

Melby, Alan

2019 “Bells MT (Machine Translation) Does Not Yet Ring.” Presentation at APTIF 9: Reality vs. Illusion: From Morse Code to Machine Translation.

Muzii, Luigi

2021 “Close Call – Observations on Productivity, Talent Shortages, & Human Parity MT.” eMpTy Pages. Accessed June 12, 2021. [URL]

Nord, Christiane

1997 Translating as a Purposeful Activity. Functionalist Approaches Explained. Manchester: St. Jerome.

2009 Textanalyse und Übersetzen. Theoretische Grundlagen, Methode und didaktische Anwendung einer übersetzungsrelevanten Textanalyse [Text Analysis and Translation. Theoretical Foundations, Method and Didactic Application of a Translation-Relevant Text Analysis]. 4th edition. Tübingen: Gross.

Popel, Martin, Marketa Tomkova, Jakub Tomek, Łukasz Kaiser, Jakob Uszkoreit, Ondřej Bojar, and Zdeněk Žabokrtský

2020 “Transforming Machine Translation: a Deep Learning System Reaches News Translation Quality Comparable to Human Professionals.” Nature Communications 111, 1–15. Accessed June 9, 2021.

Pym, Anthony

2020 “Translation, Risk Management and Cognition.” In The Routledge Handbook of Translation and Cognition, edited by Favio Alves and Arnt Lykke Jakobsen. 445–458. New York: Routledge.

Reiß, Katharina, Hans J. Vermeer

1991 Grundlegung einer allgemeinen Translationstheorie [Laying the Foundations for a General Theory of Translation and Interpreting]. 2nd edition. Tübingen: Niemeyer.

Risku, Hanna

2004 Translationsmanagement. Interkulturelle Fachkommunikation im Kommunikationszeitalter [Translation Management. Intercultural LSP Communication in the Communication Age]. Tübingen: Narr.

Schmitt, Peter A.

2015 “Who Is Afraid of MT?” Lebende Sprachen 60(2), 234–258.

Sulubacak, Umut, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, and Jörg Tiedemann

2020 “Multimodal Machine Translation through Visuals and Speech.” Machine Translation 34(2–3), 97–147.

Toral, Antonio, Sheila Castilho, Ken Hu, and Andy Way

2018 “Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation.” In Proceedings of the Third Conference on Machine Translation: Research Papers, edited by Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, and Karin Verspoor. 113–123. Accessed June 9, 2021.

Vashee, Kirti

2021a “The Quest for Human Parity Machine Translation.” eMpTy Pages. Accessed November 6, 2021. [URL]

2021b “Understanding Machine Translation Quality: A Review.” eMpTy Pages. Accessed November 6, 2021. [URL]

2021c “The Human-in-the-Loop Driving MT Progress.” eMpTy Pages. Accessed November 6, 2021. [URL]

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jacob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin

2017 “Attention Is All You Need.” In Advances in Neural Information Processing Systems 30 (NIPS 2017), edited by Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett. 1–11. Accessed June 9, 2021. [URL]

Vieira, Lucas Nunes

2020 “Machine Translation in the News. A Framing Analysis of the Written Press.” Translation Spaces 9(1), 98–122.

Way, Andy

2019 “Machine Translation: Where Are We at Today? In The Bloomsbury Companion to Language Industry Studies, edited by Erik Angelone, Maureen Ehrensberger-Dow, and Gary Massey. 311–332. Bloomsbury Academic.

Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean

2016 “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.” arXiv. Accessed June 9, 2021. [URL]

Cited by (7)

Cited by 7 other publications

Order by:

Durr, Margarete

2024. Le traducteur humain a-t-il (encore) un avenir en traduction juridique ?. Lebende Sprachen 69:1 ► pp. 69 ff.

Li, Chen & Zhiyuan Sun

2024. Evaluation of the Quality of Sustainable Entrepreneurship Education in Universities Based on the Grey Correlation Algorithm. Journal of Information & Knowledge Management 23:03

Moorkens, Joss

2024. ‘I am not a number’: on quantification and algorithmic norms in translation. Perspectives 32:3 ► pp. 477 ff.

Li, Ruichao, Abdullah Mohd Nawi & Myoung Sook Kang

2023. Human-machine Translation Model Evaluation Based on Artificial Intelligence Translation. EMITTER International Journal of Engineering Technology 11:2 ► pp. 145 ff.

Yang, Yanxia, Runze Liu, Xingmin Qian & Jiayue Ni

2023. Performance and perception: machine translation post-editing in Chinese-English news translation by novice translators. Humanities and Social Sciences Communications 10:1

Krüger, Ralph

2022. Integrating professional machine translation literacy and data literacy. Lebende Sprachen 67:2 ► pp. 247 ff.

Krüger, Ralph

2023. Artificial intelligence literacy for the language industry – with particular emphasis on recent large language models such as GPT-4. Lebende Sprachen 68:2 ► pp. 283 ff.

This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.