Chapter 12. Exploring variation in translation with probabilistic language models

Karakanta, Alina; Przybyl, Heike; Teich, Elke

doi:10.1075/btl.158.12kar

Part of

Corpora in Translation and Contrastive Research in the Digital Age: Recent advances and explorations
Edited by Julia Lavid-López, Carmen Maíz-Arévalo and Juan Rafael Zamorano-Mansilla
[Benjamins Translation Library 158] 2021
► pp. 307–323

Chapter 12
Exploring variation in translation with probabilistic language models

Alina Karakanta | Fondazione Bruno Kessler / University of Trento

Heike Przybyl | Department of Language Science and Technology, Saarland University

Elke Teich | Department of Language Science and Technology, Saarland University

While some authors have suggested that translationese fingerprints are universal, others have shown that there is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our work, we attempt to gain empirical insights into variation in translation, focusing here on translation mode (translation vs. interpreting). Our goal is to discover features of translationese and interpretese that distinguish translated and interpreted output from comparable original text/speech as well as from each other at different linguistic levels. We use relative entropy (Kullback-Leibler Divergence) and visualization with word clouds. Our analysis shows differences in typical words between originals vs. non-originals as well as between translation modes both at lexical and grammatical levels.

Keywords: translationese, interpretese, relative entropy, language models, parallel corpora, comparable corpora

Article outline

1.Introduction
2.Corpus data
3.Methods
- 3.1Probabilistic language models and analysis of translation variation
- 3.2Comparing language models by relative entropy
4.Analysis and results
- 4.1Translation direction: Originals vs. Translation/Interpreting
- 4.2Translation mode: Translation vs. Interpreting
5.Summary and discussion
Acknowledgements
Notes
References

Published online: 8 December 2021

https://doi.org/10.1075/btl.158.12kar

References (33)

References

Baker, Mona. 1993. “Corpus Linguistics and Translation Studies: Implications and Applications”. In: Text and Technology: In honour of John Sinclair. Ed. by Mona Baker, Gill Francis, and Elena Tognini-Bonelli. Amsterdam, Netherlands: John Benjamins Publishing Company, pp. 233–252.

Bendazzoli, Claudio, and Annalisa Sandrelli. 2005. “An Approach to Corpus-Based Interpreting Studies: Developing EPIC (European Parliament Interpreting Corpus”. MuTra2005 – Challenges of Multidimensional Translation: Conference Proceedings.

Baroni, Marco, and Silvia Bernardini. 2006. “A new approach to the study of Translationese: Machine-learning the difference between original and translated text”. Literary and Linguistic Computing, 21(3):259–274.

Bernardini, Silvia, Adriana Ferraresi and Maja Miličević. 2016. “From EPIC to EPTIC – Exploring simplification in interpreting and translation from an intermodal perspective”. Target 28: 61–86.

Bernardini, Silvia, Adriana Ferraresi, Mariachiara Russo, Camille Collard and Bart Defrancq. 2018. “Building Interpreting and Intermodal Corpora: A How-to for a Formidable Task”. In: Making Way in Corpus-based Interpreting Studies. Ed. by Mariachiara Russo, Claudio Bendazzoli and Bart Defrancq. Singapore: Springer. pp. 21–42.

Chesterman, Andrew. 2004. “Beyond the particular”. In: Translation Universals – Do they exist? Ed. by Mauren, Anna and Kujamäki, Pekka. Benjamins Translation Library, 48(vi): 224.

Crocker, Matthew, Vera Demberg and Elke Teich. 2016. “Information Density and Linguistic Encoding (IDeaL)”. KI – Künstliche Intelligenz, 30(1): 77–81.

Defrancq, Bart. 2015. “Corpus-based research into the presumed effects of short EVS”. In: Interpreting 17.1: 26–45.

Degaetano-Ortlieb, Stefania and Teich, Elke. 2018. “Using relative entropy for detection and analysis of periods of diachronic linguistic change”. In: Proceedings of the 2nd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, COLING 2018, Santa Fe, NM, USA.

Degaetano-Ortlieb, Stefania and Elke Teich. 2019. “Toward an optimal code for communication: The case of scientific English”. Corpus Linguistics and Linguistic Theory 2019 aop.

De Sutter, Gert, Isabelle Delaere and Koen Plevoets. 2012. “Lexical Lectometry in Corpus-Based Translation Studies. Combining Profile-Based Correspondence Analysis and Logistic Regression Modeling.” In: Quantitative Methods in Translation Studies. Ed. by Michael Oakes and Meng Ji, pp. 326–346. Amsterdam: John Benjamins.

Fankhauser, Peter, Jörg Knappen and Elke Teich. 2014. “Exploring and Visualizing Variation in Language Resources”. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). Reykjavik, Iceland: European Language Resources Association (ELRA).

Gellerstam, Martin. 1986. “Translationese in Swedish novels translated from English”. In: Translation Studies in Scandinavia: Proceedings from the Scandinavian Symposium on Translation Theory (SSOTT). Ed. by Lars Wollin and Hans Lindquist. Lund, Sweden: CWK Gleerup, pp. 88–95.

Halverson, Sandra. 2003. “The Cognitive Basis of Translation Universals.” Target 15(2): 197–241.

Hareide, Lidun. 2019. “Comparable parallel corpora: A critical review of current practices in corpus-based translation studies”. In: Parallel Corpora for Contrastive and Translation Studies. New resources and applications. Ed. by Doval, Irene and M. Teresa Sanchez Nieto. Benjamins, Amsterdam, pp. 19–38.

Hughes, James M., Nicholas J. Foti, David C. Krakauer and Daniel N. Rockmore. 2012. “Quantitative patterns of stylistic influence in the evolution of literature”. Proceedings of the National Academy of Sciences 109(20). 7682–7686.

Jurafsky, D. and Martin, J. H. 2008. Speech and Language Processing: An introduction to speech recognition, computational linguistics and natural language processing. Upper Saddle River, NJ: Prentice Hall.

Kajzer-Wietrzny, Marta. 2012. “Interpreting universals and interpreting style”. PhD thesis. Adam Mickiewicz University, Poznań, Poland.

Karakanta, Alina, Mihaela Vela and Elke Teich. 2018. “Preserving Metadata from Parliamentary Debates”. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA).

Klingenstein, Sara, Tim Hitchcock and Simon De Deo. 2014. “The civilizing process in London’s Old Bailey”. Proceedings of the National Academy of Sciences 111(26). 9419–9424.

Koehn, Philipp. 2005. “Europarl: a parallel corpus for statistical machine translation”. In: Proceedings of the Tenth Machine Translation Summit. Phuket, Thailand: Asia-Pacific Association for Machine Translation, pp. 79–86.

Koppel, Moshe and Noam Ordan. 2011. “Translationese and its dialects”, In: Proceedings of Conference of the Association for Computational Linguistics (ACL), Portland, Oregon, pp. 1318–1326.

Lapshinova-Koltunski, Ekaterina and Marcos Zampieri. 2018. “Linguistic features of genre and method variation in translation: a computational perspective”. In: The Grammar of Genres and Styles: From Discrete to Non-Discrete Units. Ed. by Dominique Legallois, Thierry Charnois, and Meri Larjavaara. Berlin, Boston: De Gruyter Mouton, pp. 92–117.

Monti, Cristina, Claudio Bendazzoli, Annalisa Sandrelli and Mariachiara Russo. 2005. “Studying Directionality in Simultaneous Interpreting through an Electronic Corpus: EPIC (European Parliament Interpreting Corpus. Meta, 50 (4).

Östling, Robert and Jörg Tiedemann. 2017. “Continuous multilinguality with language vectors”. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Valencia, Spain: Association for Computational Linguistics, pp. 644–649.

Rabinovich, Ella and Shuly Wintner. 2015. “Unsupervised Identification of Translationese”. In: Transactions of the Association for Computational Linguistics 3: 419–432.

Rubino, Raphael, Ekaterina Lapshinova-Koltunski and Josef van Genabith. 2016. “Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification”. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL). Association for Computational Linguistics. ACL.

Sandrelli, Annalisa and Claudio Bendazzoli. 2005. “Lexical Patterns in Simultaneous Interpreting: A Preliminary Investigation of EPIC (European Parliament Interpreting Corpus)”. In: Proceedings from the Corpus Linguistics Conference Series 1. Birmingham, UK: University of Birmingham.

Shlesinger, Miriam and Noam Ordan. 2012. “More spoken or more translated? Exploring a known unknown of simultaneous interpreting”. In: Target 24(1):43–60.

Szymor, Nina. 2018. “Translation: universals or cognition? A usage-based perspective”. In: Target 30(1):53–86.

Teich, Elke. 2003. Cross-linguistic Variation in System and Text: A Methodology for the Investigation of Translations and Comparable Texts. Mouton de Gruyter.

Teich, Elke, José Martínez Martínez and Alina Karakanta (2020). “Translation, information theory and cognition”. In: Routledge Handbook of Translation and Cognition. Ed. by Fabio Alves and Arnt Lykke Jakobson. London: Routledge, pp. 360–375.

Zou, Will Y., Richard Socher, Daniel Cer and Christopher D. Manning. 2013. “Bilingual Word Embeddings for Phrase-Based Machine Translation”. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, Washington, USA: Association for Computational Linguistics, pp. 1393–1398.

Cited by (1)

Cited by one other publication

Shi, Yaqian & Lei Lei

2024. Structural Complexity in Adapted Reading Materials: A Study Based on the Amount of Information. Reading Research Quarterly

This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.

Chapter 12Exploring variation in translation with probabilistic language models

Cited by one other publication

Chapter 12
Exploring variation in translation with probabilistic language models