Source language classification of indirect translations

Ilmari Ivaska and Laura Ivaska

One of the major barriers to the systematic study of indirect translation – that is, translations of translations – is the lack of efficient methods to identify these translations. In this article, we use supervised machine learning to examine whether computers can be harnessed to identify indirect translations. Our data consist of a monolingual comparable corpus that includes (1) nontranslated Finnish texts, (2) direct translations from English, French, German, Greek, and Swedish into Finnish, and (3) indirect translations from Greek (the ultimate source language) via English, French, German, and Swedish (mediating languages) into Finnish. We use n-grams of various types and lengths as feature sets and random forests as the statistical classification technique. To maximize the transferability of the method, the feature sets were implemented in accordance with the Universal Dependencies framework. This study confirms that computers can distinguish between translated and nontranslated Finnish, as well as between Finnish translations made from different source languages. Regarding indirect translations, the ultimate source language has a greater impact on the linguistic composition of indirect Finnish translations than their respective mediating languages. Hence, the indirect translations could not be reliably identified. Therefore, our results suggest that the reliable computational identification of indirect translations and their mediating languages requires a way to control for the effect of the ultimate source language.

Publication history
Table of contents

In this article, we study indirect translation (ITr), which, put simply, is translating from translation(s). For example, the Finnish translation Kerro minulle, Zorbas ‘Tell me, Zorbas’ (1954b), by Vappu Roos, of Nikos Kazantzakis’s novel Βίος και πολιτεία του Αλέξη Ζορμπά Vios kai politeía tou Aléxi Zormpá (1946, published in Carl Wildman’s English translation under the title Zorba the Greek [1952]) was not done from the original Greek but from the French translation by Yvonne Gauthier, Gisèle Prassinos, and Pierre Fridas, titled Alexis Zorba (1954a). In this case, ITr forms the chain Greek–French–Finnish, where Greek is the ultimate source language (ultimate SL), French is the mediating language, and Finnish is the ultimate target language (ultimate TL). An ITr may also be compilative, that is, based on several source texts (STs) in one or several SLs. For example, the Finnish translation Veljesviha ‘Hatred of brothers’ (1967), by Kyllikki Villa, of Kazantzakis’s novel Οι Αδερφοφάδες Oi aderfofádes (1963, published in Athena Gianakas Dallas’s English translation as The Fratricides [1964]) has three STs: the French translation (Les frères ennemis ‘The enemy brothers’, 1965, translated by Pierre Aellig), the English translation, and the Greek version (for more details, see L. Ivaska [2021]; for discussion on further types of ITrs, see, e.g., Washbourne 2013; Assis Rosa, Pięta, and Bueno Maia 2017).

Full-text access is restricted to subscribers. Log in to obtain additional credentials. For subscription information see Subscription & Price. Direct PDF access to this article can be purchased through our e-platform.


Assis Rosa, Alexandra, Hanna Pięta, and Rita Bueno Maia
2017 “Theoretical, Methodological and Terminological Issues Regarding Indirect Translation: An Overview.” Translation Studies 10 (2): 113–132. DOI logoGoogle Scholar
Baker, Mona
1993 “Corpus Linguistics and Translation Studies – Implications and Applications.” In Text and Technology: In Honour of John Sinclair, edited by Mona Baker, Gill Francis, and Elena Tognini-Bonelli, 233–250. Amsterdam: John Benjamins. DOI logoGoogle Scholar
Baroni, Marco, and Silvia Bernardini
2006 “A New Approach to the Study of Translationese: Machine-Learning the Difference between Original and Translated Text.” Literary and Linguistic Computing 21 (3): 259–274. DOI logoGoogle Scholar
Breiman, Leo
2001 “Random Forests.” Machine Learning 45 (1): 5–32. DOI logoGoogle Scholar
Cartoni, Bruno, Sandrine Zufferey, and Thomas Meyer
2013 “Using the Europarl Corpus for Cross-Linguistic Research.” Belgian Journal of Linguistics 27 (1): 23–42. DOI logoGoogle Scholar
Čermák, František, and Alexandr Rosen
2012 “The Case of InterCorp: A Multilingual Parallel Corpus.” International Journal of Corpus Linguistics 17 (3): 411–427. DOI logoGoogle Scholar
Fernández Muñiz, Iris
2016 “Tracking Sources in Indirect Translation Archaeology: A Case Study on a 1917 Spanish Translation of Ibsen’s Et Dukkehjem (1879).” In New Horizons in Translation Research and Education 4, edited by Turo Rautaoja, Tamara Mikolič Južnič, and Kaisa Koskinen, 115–132. Joensuu: University of Eastern Finland.Google Scholar
Genette, Gérard
1991 “Introduction to the Paratext.” New Literary History 22 (2): 261–272. DOI logoGoogle Scholar
Hanes, Vanessa Lopes Lourenço
2017 “Between Continents: Agatha Christie’s Translations as Intercultural Mediators.” Cadernos de Tradução 37 (1): 208–229. DOI logoGoogle Scholar
Islam, Zahurul, and Armin Hoenen
2013 “Source and Translation Classification Using Most Frequent Words.” In Proceedings of the Sixth International Joint Conference on Natural Language Processing, Nagoya, Japan, 14–18 October 2013, edited by Ruslan Mitkov and Jong C. Park, 1299–1305. Nagoya: Asian Federation of Natural Language Processing.Google Scholar
Ivaska, Ilmari, and Silvia Bernardini
2020 “Constrained Language Use in Finnish: A Corpus-Driven Approach.” Nordic Journal of Linguistics 43 (1): 33–57. DOI logoGoogle Scholar
Ivaska, Laura
2019 “Distinguishing Translations from Non-translations and Identifying (In)direct Translations’ Source Languages.” In Proceedings of the Research Data and Humanities (RDHum) 2019 Conference: Data, Methods and Tools, edited by Jarmo Harri Jantunen, Sisko Brunni, Niina Kunnas, Santeri Palviainen, and Katja Västi. Studia humaniora ouluensia 17, 125–138. Oulu: University of Oulu.Google Scholar
2020 “Identifying (Indirect) Translations and Their Source Languages in the Finnish National Bibliography Fennica: Problems and Solutions.” In MikaEL 13: 75–88.Google Scholar
2021 “The Genesis of a Compilative Translation and its de facto Source Text.” In Genetic Translation Studies: Conflict and Collaboration in Liminal Spaces, edited by Ariadne Nunes, Joana Moura, and Marta Pacheco Pinto, 72–88. London: Bloomsbury. DOI logoGoogle Scholar
Kanerva, Jenna, Filip Ginter, Niko Miekka, Akseli Leino, and Tapio Salakoski
2018 “Turku Neural Parser Pipeline: An End-to-End System for the CoNLL 2018 Shared Task.” In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, edited by Daniel Zeman and Jan Hajič, 133–142. Brussels: Association for Computational Linguistics.Google Scholar
Kazantzakis, Nikos
1946Βίος και πολιτεία του Αλέξη Ζορμπά [Life and times of Alexis Zorbas]. Athens: Dimitrakou.Google Scholar
1952Zorba the Greek. Translated by Carl Wildman. New York: Simon and Schuster.Google Scholar
1954aAlexis Zorba. Translated by Yvonne Gauthier, Gisèle Prassinos, and Pierre Fridas. Paris: Plon.Google Scholar
1954bKerro minulle, Zorbas [Tell me, Zorbas]. Translated by Vappu Roos. Helsinki: Tammi.Google Scholar
1963Οι Αδερφοφάδες [The fratricides]. Athens: Unknown.Google Scholar
1964The Fratricides. Translated by Athena Gianakas Dallas. New York: Simon and Schuster.Google Scholar
1965Les frères ennemis [The enemy brothers]. Translated by Pierre Aellig. Paris: Plon.Google Scholar
1967Veljesviha [Hatred of brothers]. Translated by Kyllikkki Villa. Helsinki: Tammi.Google Scholar
Koehn, Philipp
2005 “Europarl: A Parallel Corpus for Statistical Machine Translation.” In Proceedings of Machine Translation Summit X: Papers, 79–86. Phuket: Association for Computational Linguistics.Google Scholar
Koppel, Moshe, and Noam Ordan
2011 “Translationese and its Dialects.” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Volume 1, edited by Dekang Lin, 1318–1326. Portland: Association for Computational Linguistics.Google Scholar
Lynch, Gerard, and Carl Vogel
2012 “Towards the Automatic Detection of the Source Language of a Literary Translation.” In Proceedings of COLING 2012: Posters, edited by Martin Kay and Christian Boitet, 775–784. Mumbai: The COLING 2012 Organizing Committee.Google Scholar
Mauranen, Anna
2004 “Corpora, Universals and Interference.” In Translation Universals: Do They Exist? edited by Anna Mauranen and Pekka Kujamäki, 65–82. Amsterdam: John Benjamins. DOI logoGoogle Scholar
Meyer, David, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel, and Friedrich Leisch
2021E1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071). TU Wien.Google Scholar
Nisioi, Sergiu
2015 “Unsupervised Classification of Translated Texts.” In Natural Language Processing and Information Systems, edited by Chris Biemann, Siegfried Handschuh, André Freitas, Farid Meziane, and Elisabeth Métais, 323–334. Cham: Springer. DOI logoGoogle Scholar
Nivre, Joakim, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, and Daniel Zeman
2020 “Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection.” In Proceedings of 12th Conference on Language Resources and Evaluation LREC’2020, edited by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck et al., 4034–4043. Marseille: European Language Resources Association.Google Scholar
Popescu, Marius
2011 “Studying Translationese at the Character Level.” In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011, edited by Ruslan Mitkov and Galia Angelova, 634–639. Hissar: Association for Computational Linguistics.Google Scholar
R Core Team
2021R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.Google Scholar
Rabinovich, Ella, Sergiu Nisioi, Noam Ordan, and Shuly Wintner
2016 “On the Similarities between Native, Non-Native and Translated Texts.” In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, edited by Katrin Erk and Noah A. Smith, 1870–1881. Berlin: Association for Computational Linguistics. DOI logoGoogle Scholar
Rabinovich, Ella, Noam Ordan, and Shuly Wintner
2017 “Found in Translation: Reconstructing Phylogenetic Language Trees from Translations.” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, edited by Regina Barzilay and Min-Yen Kan, 530–540. Vancouver: Association for Computational Linguistics. DOI logoGoogle Scholar
Rabinovich, Ella, and Shuly Wintner
2015 “Unsupervised Identification of Translationese.” Transactions of the Association for Computational Linguistics 3: 419–432. DOI logoGoogle Scholar
Toury, Gideon
2012Descriptive Translation Studies – and Beyond. Amsterdam: John Benjamins. DOI logoGoogle Scholar
Ustaszewski, Michael
2021 “Towards a Machine Learning Approach to the Analysis of Indirect Translation.” Translation Studies 14 (3): 313–331. DOI logoGoogle Scholar
Volansky, Vered, Noam Ordan, and Shuly Wintner
2015 “On the Features of Translationese.” Digital Scholarship in the Humanities 30 (1): 98–118. DOI logoGoogle Scholar
Washbourne, Kelly
2013 “Nonlinear Narratives: Paths of Indirect and Relay Translation.” Meta 58 (3): 607–625. DOI logoGoogle Scholar
Wright, Marvin N., and Andreas Ziegler
2017 “ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software 77 (1): 1–17. DOI logoGoogle Scholar
Zei, Alki
1971Ο μεγάλος περίπατος του Πέτρου [Petros’ long journey]. Athens: Kedros.Google Scholar
1972Petros’ War. Translated by Edward Fenton. New York: E. P. Dutton.Google Scholar
1973Tämä on sotaa, Petros [This is war, Petros]. Translated by Marikki Makkonen. Porvoo: WSOY.Google Scholar