Vol. 11:2 (2022) ► pp.213–233
Some Translation Studies informed suggestions for further balancing methodologies for machine translation quality evaluation
This article intends to contribute to the current debate on the quality of neural machine translation (NMT) vs. (professional) human translation quality, where recently claims concerning (super)human performance of NMT systems have emerged. The article will critically analyse some current machine translation (MT) quality evaluation methodologies employed in studies claiming such performance of their MT systems. This analysis aims to identify areas where these methodologies are potentially biased in favour of MT and hence may overvalue MT performance while undervaluing human translation performance. Then, the article provides some Translation Studies informed suggestions for improving or debiasing these methodologies in order to arrive at a more balanced picture of MT vs. (professional) human translation quality.
Article outline
- 1.Introduction
- 2.The need for properly balanced MT quality evaluation methodologies
- 3.The current debate on (super)human performance of NMT
- 3.1Google: Bridging the gap between human and machine translation
- 3.2Microsoft: Parity between professional human and machine translation
- 3.3Criticism of Microsoft’s evaluation methodology
- 3.4CUBBIT: Human translation is not the upper bound of translation quality
- 4.Suggestions for further balancing MT quality evaluation methodologies
- 4.1The quality of the human reference translations against which MT quality is to be measured
- 4.2The extent of translational context taken into consideration in the MT quality evaluation campaign
- 4.3Weighing translation errors according to their severity
- 4.4Integrating MT systems into high-quality translation settings in order to measure the added value of professional human translators
- 5.Areas where current NMT systems necessarily underperform compared to professional human translators
- 6.Conclusion
- Acknowledgements
- Notes
-
References
https://doi.org/10.1075/ts.21026.kru