Chapter 4
Post-editing neural machine translation in specialised languages
The role of corpora in the translation of phraseological structures
This study focuses on phraseology in specialised texts and on students’ difficulties pertaining to
phraseology in post-editing neural machine translation output. It is undertaken within the corpus-based methodological
framework that we have developed for several purposes, one of which being to assess the impact of corpus use on
translation and post-editing. The objective of the study is to propose a descriptive analysis of typical student
errors related to phraseology in order to design tailored pedagogical materials. We aim to show that, with consistent
training in querying corpora and in interpreting results in an appropriate manner, students can manage to improve
their productions when translating specialised texts or when post-editing machine translation output.
Article outline
- 1.Introduction
- 2.Theoretical background
- 2.1Corpora in translation training
- 2.2Machine translation and post-editing
- 2.3Phraseology in LSP and NMT
- 3.Context, methods and data
- 3.1Context
- 3.2Methods and data
- 4.Analysis of typical student errors in post-editing phraseology
- 4.1Type 1: Overconfidence in MT (or under-editing of MT output)
- 4.2Type 2: Underconfidence in MT (or over-editing of MT output)
- 4.3Type 3: Failure to correct MT output
- 5.Constructing classroom activities
- 6.Towards the analysis of NMT output on MWUs
- 7.Conclusion
-
Notes
-
References
References (49)
References
Aston, G. (1999). Corpus
use and learning to
translate. Textus, 12, 289–313.
Baker, M. (1998). Réexplorer la langue de la traduction: Une approche par corpus (Investigating the language of
translation: A corpus-based approach). Meta: Journal des
traducteurs / Meta: Translators’
Journal, 43(4), 480–485.
Bojar, O., Chatterjee, R., Federmann, Ch., Graham, Y., et al. (2016). Findings
of the 2016 conference on machine translation. Proceedings of the
first conference on machine translation: Volume 2, Shared task
papers (pp. 131–198). Association for Computational Linguistics.
Bowker, L. (1998). Using
specialized monolingual native-language corpora as a translation resource: A pilot
study. Meta: Journal des traducteurs / Meta: Translators’
Journal, 43(4), 631–651.
Bowker, L., & Bennison, P. (2003). Student
translation archive and student translation tracking system. Design, development and
application. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora
in translator
education (pp. 103–118). St. Jerome Publishing.
Castagnoli, S., Ciobanu, D., Kübler, N., Kunz, K., & Volanschi, A. (2011). Designing
a learner translator corpus for training
purposes. In N. Kübler (Ed.), Corpora,
language, teaching, and resources: From theory to
practice (pp. 221–248). Peter Lang.
Colson, J. -P. (2019). Multi-word
units in machine translation: Why the tip of the iceberg remains problematic – and a tentative corpus-driven
solution. MUMTT 2019, the 4th Workshop on Multi-word Units in Machine
Translation and Translation Technology. [URL].
Corpas Pastor, G. (2013). All
that glitters is not gold when translating phraseological
units. In J. Monti, R. Mitkov, G. Corpas Pastor & V. Seretan (Eds.), Workshop
proceedings for multi-word units in machine translation and translation
technologies (pp. 9–10). The European Association for Machine Translation.
Corpas Pastor, G., Mitkov, R., Afzal, N. & Pekar V. (2008). Translation
universals: Do they exist? A corpus-based NLP study of convergence and
simplification. Proceedings of the 8th AMTA
conference (pp. 75–81).
Coxhead A. & Hirsh D. (2007). A
pilot science-specific word list. Revue française de linguistique
appliquée, 12(2), 65–78.
Espunya, A. (2014). The
UPF learner translation corpus as a resource for translator training. Language
Resources and
Evaluation 48, 33–43.
Frankenberg-Garcia, A. (2015). Training
translators to use corpora hands-on: Challenges and reactions by a group of 13 students at a UK
university. Corpora, 10(2), 351–380.
Gautier, L. (2003). Terminologie et phraséologie comparées du droit constitutionnel en français et en
allemand. L’espace euro-méditerranéen: Une idiomaticité
partagée. [URL]
Gledhill C., & Kübler N. (2015). How
trainee translators analyse lexico-grammatical
patterns. In M. I. González-Rey (Ed.), Phraseology,
phraseodidactics and construction
grammar(s) (pp. 162–178). Special
issue of Journal of Social
Sciences 11(3).
Gledhill, C. (2000). Collocations
in science writing. Language in Performance Series,
22. Gunter Narr Verlag.
Granger, S. (1998). Prefabricated
patterns in advanced EFL writing: Collocations and lexical
phrases. In A. P. Cowie (Ed.) Phraseology:
Theory, analysis and
applications (pp. 145–160). Oxford.
Granger, S., & Lefer, M. -A. (2020). The
multilingual student translation corpus: A resource for translation teaching and
research. Language resources &
evaluation, 54(4), 1183–1199.
Granger, S., & Paquot, M. (2015). Electronic
lexicography goes local: Design and structures of a needs-driven online academic writing
aid. Lexicographica – International Annual for
Lexicography, 31(1), 118–141.
House, J. (2008). Beyond
intervention: Universals in translation. Transkom 1(1), 6–19.
Koponen, M. (2015). How
to teach machine translation post-editing? Experiences from a post-editing
course. In M. Simard & S. O’Brien (Eds.), Proceedings
of 4th Workshop on Post-Editing Technology and Practice
(WPTP4). Miami, Nov. 3, 2015.
Kübler, N. (2003). Corpora
and LSP translation. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora
in translator
education (pp. 25–42). St. Jerome Publishing.
Kübler, N. (2008). A
comparable learner translator corpus: Creation and
use. In P. Zweigenbaum (Ed.), Proceedings
of the Comparable Corpora Workshop of the LREC
Conference (pp. 73–78). May 28–30, 2008, Marrakech, Morocco.
Kübler, N., Mestivier-Volanschi, A., & Pecman, M. (2018). Teaching
specialised translation through corpus linguistics: Quality assessment and methodology evaluation by
experimental approach. META: Journal des traducteurs / Meta:
Translators’
Journal, 63(3), 806–824.
Kübler, N., Mestivier, A., Pecman, M., & Zimina, M. (2016). Exploitation quantitative de corpus de traductions annotés selon la typologie d’erreurs pour
améliorer les méthodes d’enseignement de la traduction spécialisée. Actes des 13es Journées internationales d’Analyse statistique des Données
Textuelles, 731–741. 7–10 June 2016, Nice, France.
Kübler, N., Pecman, M., & Mestivier-Volanschi, A. (2015). Étude sur l’utilisation des corpus dans l’enseignement de la terminologie et de la traduction
spécialisée, Terrains de recherche en linguistique appliquée
(TRELA 2015). July 2015, Paris, France.
Kübler, N., Mestivier, A., & Pecman, M. (2021). Using
comparable corpora for translating and post-editing complex noun phrases in specialised texts: Insights from
English-to-French in specialised
translation. In S. Granger, & M-A. Lefer (Eds.), Extending
the scope of corpus-based translation
studies (pp. 237–266). Bloomsbury publishing.
Kunilovskaya, M., & Morgoun, N. (2016). Available
corpora and error-annotated student translations in translator
education. Proceedings of the 6th Conference. The Future of
Education, 121–125. Libreria
Universitaria.
Laviosa-Braithwaite, S. (2001). Universals
of translation. In M. Baker (Ed.), Routledge
encyclopedia of translation
studies (pp. 288–291). Routledge.
Loock, R. (2020). No
more rage against the machine: How the corpus-based identification of machine-translationese can lead to
student empowerment. The Journal of specialised translation
(JoSTrans), 34, 150–170.
Loock, R., Mariaule, M., & Oster, C. (2013). Traductologie de corpus et qualité: Étude de cas. Tralogy II, Session 5 – Assessing Quality in MT / Mesure de la qualité en
TA. 17–18 January 2013, Paris.
Maniez, F. (2001). Extraction d’une phraséologie bilingue en langue de spécialité : Corpus parallèles et corpus
comparables. Études terminologiques et
linguistiques. Meta, 46(3), 552–563.
Maniez, F. (2017). An appraisal of recent breakthroughs in machine translation: The case of past participle-based
compound adjectives in ESP (Evaluation des récentes avancées de la traduction automatique: Le cas des
adjectifs composés formés à partir d’un participe passé en anglais de
spécialité). ASp 72, 29–48.
Martikainen, H. (2019). Post-editing
neural MT in medical LSP: Lexico-grammatical patterns and distortion in the communication of specialized
knowledge. Informatics, Special Issue “Advances in Computer-Aided Translation
Technology”, 6.
Martikainen, H. (2020). Enseigner une approche raisonnée de la traduction automatique à l’ère du
numérique. Traduction et humanités
numériques. November 2020, Università
Ca’Foscari, Venice, Italy.
Martikainen, H., & Kübler, N. (2016). Ergonomie cognitive de la post-édition de traduction automatique: Enjeux pour la qualité des
traductions, ILCEA. Revue de l’Institut des langues et cultures
d’Europe, Amérique, Afrique, Asie et Australie 27.
Mauranen, A. (2008). Universal
tendencies in translation. In G. Anderman & M. Rogers (Eds.), Incorporating
corpora: The linguist and the
translator (pp. 32–48). Multilingual matters.
Monti, J., Seretan, V., Corpas Pastor, G. Mitkov, R. (2018). Multiword
units in machine translation and translation
technology. In R. Mitkov, J. Monti, G. Corpas Pastor & V. Seretan (Eds.), Multiword
units in machine translation and translation technology, Current Issues in
Linguistic Theory,
341 (pp. 2–37). John Benjamins.
O’Brien, S. (2002). Teaching
post-editing: A proposal for course content. Proceedings of 6th EAMT Workshop
Teaching Machine
Translation, 99–106. Manchester, UK.
Pecman, M. (2007). Approche onomasiologique de la langue scientifique générale. Revue française de linguistique appliquée « Lexique des écrits scientifiques
», 12(2), 79–96.
Pecman, M., & Kübler, N. (2011). ARTES:
An online lexical database for research and teaching in specialized translation and
communication. Proceedings from International Workshop on Lexical Resources
(WoLeR)
2011, 86–93. 1–5 August 2011, Ljubljana, Slovenia.
Toral, A. (2019). Post-editese:
An exacerbated Translationese, The 17th Machine Translation
Summit. 19–23 August 2019, Dublin
City University, Dublin, Ireland.
Tutin A. (2007). Modélisation linguistique et annotation des collocations: application au lexique
transdisciplinaire des écrits scientifiques. In S. Koeva, D. Maurel, & M. Silberztein (Eds.), Formaliser les langues avec
l’ordinateur (pp. 189–215). Presses universitaires de Franche-Comté.
Zanettin, F. (1998). Bilingual
comparable corpora and the training of translators. META: Journal
des traducteurs / Meta: Translators’
Journal, 43(4), 616–630.