Can a corpus-driven lexical analysis of human and machine translation unveil discourse features that set them apart?

Frankenberg-Garcia, Ana

doi:10.1075/target.20065.fra

Article published In:

Target
Vol. 34:2 (2022) ► pp.278–308

Can a corpus-driven lexical analysis of human and machine translation unveil discourse features that set them apart?

Ana Frankenberg-Garcia | University of Surrey

There is still much to learn about the ways in which human and machine translation differ with regard to the contexts that regulate the production and interpretation of discourse. The present study explores whether a corpus-driven lexical analysis of human and machine translation can unveil discourse features that set the two apart. A balanced corpus of source texts aligned with authentic, professional translations and neural machine translations was compiled for the study. Lexical discrepancies in the two translation corpora were then extracted via a corpus-driven keyword analysis, and examined qualitatively through parallel concordances of source texts aligned with human and machine translation. The study shows that keyword analysis not only reiterates known problems of discourse in machine translation such as lexical inconsistency and pronoun resolution, but can also provide valuable insights regarding contextual aspects of translated discourse deserving further research.

Keywords: machine translation, MT, professional translation, discourse, parallel corpora, keyword analysis

Article outline

1.Introduction
2.Background
3.Method
- 3.1Materials
- 3.2Procedure
4.Results
- 4.1Grammatical keywords
  - 4.1.1Modals
  - 4.1.2Prepositions
  - 4.1.3Pronouns
- 4.2Lexical keywords
  - 4.2.1Spelling
  - 4.2.2Proper names
  - 4.2.3Foreign words
5.Discussion and conclusion
Acknowledgements
Notes
References

Available under the Creative Commons Attribution (CC BY) 4.0 license.

For any use beyond this license, please contact the publisher at [email protected].

Published online: 8 September 2021

https://doi.org/10.1075/target.20065.fra

References (41)

Bawden, Rachel

2016 “Cross-lingual Pronoun Prediction with Linguistically Informed Features.” In Proceedings of the First Conference on Machine Translation, Berlin, Germany, 11–12 August, 564–570. Stroudsburg: Association for Computational Linguistics.

Blum-Kulka, Shoshana

1986 “Shifts of Cohesion and Coherence in Translation.” In Interlingual and Intercultural Communication: Discourse and Cognition in Translation and Second Language Acquisition Studies, edited by Juliane House and Shoshana Blum-Kulka, 17–35. Tübingen: Gunter Narr.

Carpuat, Marine, and Michel Simard

2012 “The Trouble with SMT Consistency.” In Proceedings of the Seventh Workshop on Statistical Machine Translation, Montréal, Canada, 7–8 June, edited by Chris Callison-Burch, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut, and Lucia Specia, 442–449. Stroudsburg: Association for Computational Linguistics.

Catford, John C.

1965 A Linguistic Theory of Translation: An Essay in Applied Linguistics. Oxford: Oxford University Press.

compara

2010 (Version 13.1.17.) Accessed April 12, 2019. [URL]

De Beaugrande, Robert, and Wolfgang Dressler

1981 Introduction to Text Linguistics. London: Longman.

Dougal, Duane K., and Deryle Lonsdale

2020 “Improving NMT Quality Using Terminology Injection.” In Proceedings of the Twelfth International Conference on Language Resources and Evaluation, Marseille, France, 11–16 May, edited by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 4820–4827. Paris: European Language Resources Association. [URL]

Frankenberg-Garcia, Ana

2008 “ ‘Suggesting Rather Special Facts’: A Corpus-Based Study of Distinctive Lexical Distributions in Translated Texts.” Corpora (3) 21: 195–211.

2009 “Are Translations Longer than Source Texts? A Corpus-Based Study of Explicitation.” In Corpus Use and Translating: Corpus Use for Learning to Translate and Learning Corpus Use to Translate: An Introduction, edited by Allison Beeby, Patricia Rodríguez Inés, and Pilar Sánchez-Gijón, 47–58. Amsterdam: John Benjamins.

2016 “A Corpus Study of Loans in Translated and Non-Translated Texts.” In Corpus-Based Approaches to Translation and Interpreting: From Theory to Applications, edited by Gloria Corpas Pastor and Miriam Seghiri, 19–42. Frankfurt: Peter Lang.

Frankenberg-Garcia, Ana, and Diana Santos

2003 “Introducing compara: The Portuguese–English Parallel Corpus.” In Corpora in Translator Education, edited by Federico Zanettin, Silvia Bernardini, and Dominic Stewart, 71–87. Manchester: St. Jerome.

Google Translator Toolkit

(2019) Accessed December 1, 2019. [URL]

Guillou, Liane

2013 “Analysing Lexical Consistency in Translation.” In Proceedings of the Workshop on Discourse in Machine Translation, Soa, Bulgaria, 9 August, edited by Bonnie Webber, Andrei Popescu-Belis, Katja Markert, and Jörg Tiedemann, 10–18. Stroudsburg: Association for Computational Linguistics. [URL]

2016 Incorporating Pronoun Function into Statistical Machine Translation. PhD diss. University of Edinburgh.

Guillou, Liane, Christian Hardmeier, Ekaterina Lapshinova-Koltunski, and Sharid Loáiciga

2018 “A Pronoun Test Suite Evaluation of the English–German MT Systems at WMT 2018.” In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, Brussels, Belgium, 31 October – 1 November, edited by Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, and Karin Verspoor, 570–577. Stroudsburg: Association for Computational Linguistics.

Halliday, M. A. K.

1978 Language as a Social Semiotic: The Social Interpretation of Language and Meaning. London: Edward Arnold.

Hardmeier, Christian

2014 Discourse in Statistical Machine Translation. PhD diss. Uppsala University.

House, Juliane

2006 “Text and Context in Translation.” Journal of Pragmatics 38 (3): 338–358.

Kilgarriff, Adam

2009 “Simple Maths for Keywords.” In Proceedings of Corpus Linguistics Conference, Liverpool, UK. [URL]

Kilgarriff, Adam, Vit Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vit Suchomel

2014 “The Sketch Engine: Ten Years On.” Lexicography 11: 7–36.

Klaudy, Kinga

2009 “The Asymmetry Hypothesis in Translation Research.” In Translators and Their Readers: In Homage to Eugene A. Nida, edited by Rodica Dimitriu and Miriam Shlesinger, 283–303. Brussels: Les Editions du Hazard.

2017 “Linguistic and Cultural Asymmetry in Translation from and into Minor Languages.” Cadernos de Literatura em Tradução, 171, 22–37.

Koehn, Philipp

2005 “Europarl: A Parallel Corpus for Statistical Machine Translation.” In Proceedings of the Tenth Machine Translation Summit, Phuket, Thailand, 12–16 September, 79–86. Tokyo: Asia-Pacific Association for Machine Translation. [URL]

Koehn, Philipp, and Josh Schroeder

2007 “Experiments in Domain Adaptation for Statistical Machine Translation.” In Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, 23 June, 224–227. Stroudsburg: Association for Computational Linguistics.

Lapshinova-Koltunski, Ekaterina, and Christian Hardmeier

2017 “Discovery of Discourse-Related Language Contrasts through Alignment Discrepancies in English–German Translation.” In Proceedings of the Third Workshop on Discourse and Machine Translation, Copenhagen, Denmark, 8 September, edited by Bonnie Webber, Andrei Popescu-Belis, and Jörg Tiedemann, 73–81.

Läubli, Samuel, Rico Sennrich, and Martin Volk

2018 “Has Machine Translation Achieved Human Parity? A Case for Document-Level Evaluation.” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October – 4 November, edited by Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, 4791–4796. Stroudsburg: Association for Computational Linguistics.

Luong, Ngoc-Quang, and Andrei Popescu-Belis

2016 “A Contextual Language Model to Improve Machine Translation of Pronouns by Re-ranking Translation Hypotheses.” In Proceedings of the 19th Annual Conference of the European Association for Machine Translation, Riga, Latvia, special issue of Baltic Journal of Modern Computing 4 (2): 292–304.

Luong, Ngoc-Quang, Andrei Popescu-Belis, Annette Rios Gonzales, and Don Tuggener

2017 “Machine Translation of Spanish Personal and Possessive Pronouns Using Anaphora Probabilities.” In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Vol 2, Short Papers, Valencia, Spain, 3–7 April, edited by Mirella Lapata, Phil Blunsom, and Alexander Koller, 631–636. Stroudsburg: Association for Computational Linguistics.

Morante, Roser, and Caroline Sporleder

2012 “Modality and Negation: An Introduction to the Special Issue.” Computational Linguistics, 38 (2): 223–260.

Nakov, Preslav

2016 “Negation and Modality in Machine Translation.” In Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics, Osaka, Japan, 12 December, edited by Eduardo Blanco, Roser Morante, and Roser Saurí, 411. Stroudsburg: Association for Computational Linguistics. [URL]

Popescu-Belis, Andrei, Sharid Loáiciga, Christian Hardmeier, and Deyi Xiong

eds. 2019 Proceedings of the Fourth Workshop on Discourse in Machine Translation, Hong Kong, China, 3 November. Stroudsburg: Association for Computational Linguistics. [URL]

Pym, Anthony

2015 “Translating as Risk Management.” Journal of Pragmatics 851: 67–80.

Schleiermacher, Friedrich

(1813) 2004 “On the Different Methods of Translating.” In The Translation Studies Reader, 2nd ed., edited by Lawrence Venuti, 43–63. London: Routledge.

Tiedemann, Jörg

2012 “Parallel Data, Tools and Interfaces in OPUS.” In Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, edited by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 2214–2218. Stroudsburg: Association for Computational Linguistics. [URL]

Tirkkonen-Condit, Sonja

1990 “Professional vs. Non-Professional Translation: A Think-Aloud Protocol Study.” In Learning, Keeping and Using Language: Selected Papers from the Eighth World Congress of Applied Linguistics, Sydney, 16–21 August 1987, edited by M. A. K. Halliday, John Gibbons, and Howard Nicholas, 381–394. Amsterdam: John Benjamins.

Tognini-Bonelli, Elena

2001 Corpus Linguistics at Work. Amsterdam: John Benjamins.

Toral, Antonio, and Andy Way

2018 “What Level of Quality Can Neural Machine Translation Attain on Literary Text?” In Translation Quality Assessment: From Principles to Practice, vol. 11, edited by Joss Moorkens, Sheila Castilho, Federico Gaspari, and Stephen Doherty, 263–287. Cham: Springer.

Turovsky, Barak

2016 “Found in Translation: More Accurate, Fluent Sentences in Google Translate.” Google (blog), November 15 2016 [URL]

Van Dijk, Teun A.

1977 Text and Context: Explorations in the Semantics and Pragmatics of Discourse. Harlow: Longman.

Vinay, Jean-Paul, and Jean Darbelnet

(1958) 2004 “A Methodology for Translation.” In The Translation Studies Reader, 2nd ed., edited by Lawrence Venuti, 128–137. London: Routledge.

Webber, Bonnie, Andrei Popescu-Belis, and Jörg Tiedemann

eds. 2017 Proceedings of the Third Workshop on Discourse in Machine Translation, Copenhagen, Denmark, 8 September. [URL]

Cited by (1)

Cited by 1 other publications

Niu, Jiang & Yue Jiang

2024. Does simplification hold true for machine translations? A corpus-based analysis of lexical diversity in text varieties across genres. Humanities and Social Sciences Communications 11:1

This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.