Article published In:
TargetVol. 34:2 (2022) ► pp.278–308
Can a corpus-driven lexical analysis of human and machine translation unveil discourse features that set them apart?
There is still much to learn about the ways in which human and machine translation differ with regard to the contexts that regulate the production and interpretation of discourse. The present study explores whether a corpus-driven lexical analysis of human and machine translation can unveil discourse features that set the two apart. A balanced corpus of source texts aligned with authentic, professional translations and neural machine translations was compiled for the study. Lexical discrepancies in the two translation corpora were then extracted via a corpus-driven keyword analysis, and examined qualitatively through parallel concordances of source texts aligned with human and machine translation. The study shows that keyword analysis not only reiterates known problems of discourse in machine translation such as lexical inconsistency and pronoun resolution, but can also provide valuable insights regarding contextual aspects of translated discourse deserving further research.
Article outline
- 1.Introduction
- 2.Background
- 3.Method
- 3.1Materials
- 3.2Procedure
- 4.Results
- 4.1Grammatical keywords
- 4.1.1Modals
- 4.1.2Prepositions
- 4.1.3Pronouns
- 4.2Lexical keywords
- 4.2.1Spelling
- 4.2.2Proper names
- 4.2.3Foreign words
- 5.Discussion and conclusion
- Acknowledgements
- Notes
-
References
References (41)
Bawden, Rachel
2016 “
Cross-lingual Pronoun Prediction with Linguistically Informed Features.” In
Proceedings of the First Conference on Machine Translation, Berlin, Germany, 11–12 August, 564–570. Stroudsburg: Association for Computational Linguistics.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
Blum-Kulka, Shoshana
1986 “
Shifts of Cohesion and Coherence in Translation.” In
Interlingual and Intercultural Communication: Discourse and Cognition in Translation and Second Language Acquisition Studies, edited by
Juliane House and
Shoshana Blum-Kulka, 17–35. Tübingen: Gunter Narr.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Carpuat, Marine, and Michel Simard
2012 “
The Trouble with SMT Consistency.” In
Proceedings of the Seventh Workshop on Statistical Machine Translation, Montréal, Canada, 7–8 June, edited by
Chris Callison-Burch,
Philipp Koehn,
Christof Monz,
Matt Post,
Radu Soricut, and
Lucia Specia, 442–449. Stroudsburg: Association for Computational Linguistics.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Catford, John C.
1965 A Linguistic Theory of Translation: An Essay in Applied Linguistics. Oxford: Oxford University Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
compara
2010 (
Version 13.1.17.) Accessed April 12, 2019.
[URL]
De Beaugrande, Robert, and Wolfgang Dressler
1981 Introduction to Text Linguistics. London: Longman.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Dougal, Duane K., and Deryle Lonsdale
2020 “
Improving NMT Quality Using Terminology Injection.” In
Proceedings of the Twelfth International Conference on Language Resources and Evaluation, Marseille, France, 11–16 May, edited by
Nicoletta Calzolari,
Frédéric Béchet,
Philippe Blache,
Khalid Choukri,
Christopher Cieri,
Thierry Declerck,
Sara Goggi,
Hitoshi Isahara,
Bente Maegaard,
Joseph Mariani,
Hélène Mazo,
Asuncion Moreno,
Jan Odijk, and
Stelios Piperidis, 4820–4827. Paris: European Language Resources Association.
[URL]
Frankenberg-Garcia, Ana
2008 “
‘Suggesting Rather Special Facts’: A Corpus-Based Study of Distinctive Lexical Distributions in Translated Texts.”
Corpora (3) 21: 195–211.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Frankenberg-Garcia, Ana
2016 “
A Corpus Study of Loans in Translated and Non-Translated Texts.” In
Corpus-Based Approaches to Translation and Interpreting: From Theory to Applications, edited by
Gloria Corpas Pastor and
Miriam Seghiri, 19–42. Frankfurt: Peter Lang.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Frankenberg-Garcia, Ana, and Diana Santos
2003 “
Introducing compara: The Portuguese–English Parallel Corpus.” In
Corpora in Translator Education, edited by
Federico Zanettin,
Silvia Bernardini, and
Dominic Stewart, 71–87. Manchester: St. Jerome.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Google Translator Toolkit
(
2019) Accessed December 1, 2019.
[URL]
Guillou, Liane
2013 “
Analysing Lexical Consistency in Translation.” In
Proceedings of the Workshop on Discourse in Machine Translation, Soa, Bulgaria, 9 August, edited by
Bonnie Webber,
Andrei Popescu-Belis,
Katja Markert, and
Jörg Tiedemann, 10–18. Stroudsburg: Association for Computational Linguistics.
[URL]
Guillou, Liane
2016 Incorporating Pronoun Function into Statistical Machine Translation. PhD diss. University of Edinburgh.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Guillou, Liane, Christian Hardmeier, Ekaterina Lapshinova-Koltunski, and Sharid Loáiciga
2018 “
A Pronoun Test Suite Evaluation of the English–German MT Systems at WMT 2018.” In
Proceedings of the Third Conference on Machine Translation: Shared Task Papers, Brussels, Belgium, 31 October – 1 November, edited by
Ondřej Bojar,
Rajen Chatterjee,
Christian Federmann,
Mark Fishel,
Yvette Graham,
Barry Haddow,
Matthias Huck,
Antonio Jimeno Yepes,
Philipp Koehn,
Christof Monz,
Matteo Negri,
Aurélie Névéol,
Mariana Neves,
Matt Post,
Lucia Specia,
Marco Turchi, and
Karin Verspoor, 570–577. Stroudsburg: Association for Computational Linguistics.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
Halliday, M. A. K.
1978 Language as a Social Semiotic: The Social Interpretation of Language and Meaning. London: Edward Arnold.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hardmeier, Christian
2014 Discourse in Statistical Machine Translation. PhD diss. Uppsala University.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
House, Juliane
2006 “
Text and Context in Translation.”
Journal of Pragmatics 38 (3): 338–358.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kilgarriff, Adam
2009 “
Simple Maths for Keywords.” In
Proceedings of Corpus Linguistics Conference, Liverpool, UK.
[URL]
Kilgarriff, Adam, Vit Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vit Suchomel
2014 “
The Sketch Engine: Ten Years On.”
Lexicography 11: 7–36.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Klaudy, Kinga
2009 “
The Asymmetry Hypothesis in Translation Research.” In
Translators and Their Readers: In Homage to Eugene A. Nida, edited by
Rodica Dimitriu and
Miriam Shlesinger, 283–303. Brussels: Les Editions du Hazard.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Klaudy, Kinga
2017 “
Linguistic and Cultural Asymmetry in Translation from and into Minor Languages.”
Cadernos de Literatura em Tradução, 171, 22–37.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
Koehn, Philipp
2005 “
Europarl: A Parallel Corpus for Statistical Machine Translation.” In
Proceedings of the Tenth Machine Translation Summit, Phuket, Thailand, 12–16 September, 79–86. Tokyo: Asia-Pacific Association for Machine Translation.
[URL]
Koehn, Philipp, and Josh Schroeder
2007 “
Experiments in Domain Adaptation for Statistical Machine Translation.” In
Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, 23 June, 224–227. Stroudsburg: Association for Computational Linguistics.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
Lapshinova-Koltunski, Ekaterina, and Christian Hardmeier
2017 “
Discovery of Discourse-Related Language Contrasts through Alignment Discrepancies in English–German Translation.” In
Proceedings of the Third Workshop on Discourse and Machine Translation, Copenhagen, Denmark, 8 September, edited by
Bonnie Webber,
Andrei Popescu-Belis, and
Jörg Tiedemann, 73–81.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
Läubli, Samuel, Rico Sennrich, and Martin Volk
2018 “
Has Machine Translation Achieved Human Parity? A Case for Document-Level Evaluation.” In
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October – 4 November, edited by
Ellen Riloff,
David Chiang,
Julia Hockenmaier, and
Jun’ichi Tsujii, 4791–4796. Stroudsburg: Association for Computational Linguistics.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
Luong, Ngoc-Quang, and Andrei Popescu-Belis
2016 “
A Contextual Language Model to Improve Machine Translation of Pronouns by Re-ranking Translation Hypotheses.” In
Proceedings of the 19th Annual Conference of the European Association for Machine Translation, Riga, Latvia, special issue of
Baltic Journal of Modern Computing 4 (2): 292–304.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Luong, Ngoc-Quang, Andrei Popescu-Belis, Annette Rios Gonzales, and Don Tuggener
2017 “
Machine Translation of Spanish Personal and Possessive Pronouns Using Anaphora Probabilities.” In
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Vol 2, Short Papers, Valencia, Spain, 3–7 April, edited by
Mirella Lapata,
Phil Blunsom, and
Alexander Koller, 631–636. Stroudsburg: Association for Computational Linguistics.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
Morante, Roser, and Caroline Sporleder
2012 “
Modality and Negation: An Introduction to the Special Issue.”
Computational Linguistics, 38 (2): 223–260.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Nakov, Preslav
2016 “
Negation and Modality in Machine Translation.” In
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics, Osaka, Japan, 12 December, edited by
Eduardo Blanco,
Roser Morante, and
Roser Saurí, 411. Stroudsburg: Association for Computational Linguistics.
[URL]
Popescu-Belis, Andrei, Sharid Loáiciga, Christian Hardmeier, and Deyi Xiong
eds. 2019 Proceedings of the Fourth Workshop on Discourse in Machine Translation, Hong Kong, China, 3 November. Stroudsburg: Association for Computational Linguistics.
[URL]
Pym, Anthony
2015 “
Translating as Risk Management.”
Journal of Pragmatics 851: 67–80.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Schleiermacher, Friedrich
(1813) 2004 “
On the Different Methods of Translating.” In
The Translation Studies Reader, 2nd ed., edited by
Lawrence Venuti, 43–63. London: Routledge.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tiedemann, Jörg
2012 “
Parallel Data, Tools and Interfaces in OPUS.” In
Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, edited by
Nicoletta Calzolari,
Khalid Choukri,
Thierry Declerck,
Mehmet Uğur Doğan,
Bente Maegaard,
Joseph Mariani,
Asuncion Moreno,
Jan Odijk, and
Stelios Piperidis, 2214–2218. Stroudsburg: Association for Computational Linguistics.
[URL]
Toral, Antonio, and Andy Way
2018 “
What Level of Quality Can Neural Machine Translation Attain on Literary Text?” In
Translation Quality Assessment: From Principles to Practice, vol. 11, edited by
Joss Moorkens,
Sheila Castilho,
Federico Gaspari, and
Stephen Doherty, 263–287. Cham: Springer.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Turovsky, Barak
2016 “
Found in Translation: More Accurate, Fluent Sentences in Google Translate.”
Google (blog),
November 15 2016
[URL]
Van Dijk, Teun A.
1977 Text and Context: Explorations in the Semantics and Pragmatics of Discourse. Harlow: Longman.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Vinay, Jean-Paul, and Jean Darbelnet
(1958) 2004 “
A Methodology for Translation.” In
The Translation Studies Reader, 2nd ed., edited by
Lawrence Venuti, 128–137. London: Routledge.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Webber, Bonnie, Andrei Popescu-Belis, and Jörg Tiedemann
eds. 2017 Proceedings of the Third Workshop on Discourse in Machine Translation, Copenhagen, Denmark, 8 September.
[URL]
Cited by (1)
Cited by 1 other publications
Niu, Jiang & Yue Jiang
2024.
Does simplification hold true for machine translations? A corpus-based analysis of lexical diversity in text varieties across genres.
Humanities and Social Sciences Communications 11:1
![DOI logo](//benjamins.com/logos/doi-logo.svg)
This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.