Chapter 5
Recent claims of human-machine parity in translation highlight core
issues surrounding the human evaluation of machine translation
In 2018, the first claims of empirical backing for
human-machine parity in translation (HMPT) emerged at the WMT18 Conference
on Machine Translation and in a study using WMT resources. Other researchers
quickly refuted these claims, pointing to a flawed human evaluation
campaign. Subsequent HMPT claims at WMT19 were also empirically refuted.
This chapter discusses the evolution of recommendations for human evaluation
of MT stemming from these claims to HMPT and evaluates possibilities of HMPT
at WMT20 in the context of these recommendations. Finally, we summarize all
criteria for human evaluation of MT based on recent literature.
Article outline
- 1.Introduction
- 2.2018: First claims of human-machine parity in translation
- 2.1Human evaluation of MT: Metrics, metrics, metrics
- 2.2Critics of Hassan et al.’s
(2018) claims of HMPT
- 3.WMT19: Additional claims of HMPT
or even machine
super-performance
- 4.WMT20: Continued innovation and greater caution
- 5.Conclusion
-
Notes
-
References