Chapter 12
Exploring variation in translation with probabilistic language models
Heike Przybyl | Department of Language Science and Technology, Saarland University
Elke Teich | Department of Language Science and Technology, Saarland University
While some authors have suggested that translationese fingerprints are universal, others have shown that there is a fair amount of variation among translations due to source language shining through, translation type or translation mode. In our work, we attempt to gain empirical insights into variation in translation, focusing here on translation mode (translation vs. interpreting). Our goal is to discover features of translationese and interpretese that distinguish translated and interpreted output from comparable original text/speech as well as from each other at different linguistic levels. We use relative entropy (Kullback-Leibler Divergence) and visualization with word clouds. Our analysis shows differences in typical words between originals vs. non-originals as well as between translation modes both at lexical and grammatical levels.
Article outline
- 1.Introduction
- 2.Corpus data
- 3.Methods
- 3.1Probabilistic language models and analysis of translation variation
- 3.2Comparing language models by relative entropy
- 4.Analysis and results
- 4.1Translation direction: Originals vs. Translation/Interpreting
- 4.2Translation mode: Translation vs. Interpreting
- 5.Summary and discussion
-
Acknowledgements
-
Notes
-
References
References (33)
References
Bendazzoli, Claudio, and Annalisa Sandrelli. 2005. “An Approach to Corpus-Based Interpreting Studies: Developing EPIC (European Parliament Interpreting Corpus”. MuTra2005 – Challenges of Multidimensional Translation: Conference Proceedings.
Baroni, Marco, and Silvia Bernardini. 2006. “A new approach to the study of Translationese: Machine-learning the difference between original and translated text”. Literary and Linguistic Computing, 21(3):259–274.
Bernardini, Silvia, Adriana Ferraresi, Mariachiara Russo, Camille Collard and Bart Defrancq. 2018. “Building Interpreting and Intermodal Corpora: A How-to for a Formidable Task”. In: Making Way in Corpus-based Interpreting Studies. Ed. by Mariachiara Russo, Claudio Bendazzoli and Bart Defrancq. Singapore: Springer. pp. 21–42.
Chesterman, Andrew. 2004. “Beyond the particular”. In: Translation Universals – Do they exist? Ed. by Mauren, Anna and Kujamäki, Pekka. Benjamins Translation Library, 48(vi): 224.
Crocker, Matthew, Vera Demberg and Elke Teich. 2016. “Information Density and Linguistic Encoding (IDeaL)”. KI – Künstliche Intelligenz, 30(1): 77–81.
Degaetano-Ortlieb, Stefania and Teich, Elke. 2018. “Using relative entropy for detection and analysis of periods of diachronic linguistic change”. In: Proceedings of the 2nd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, COLING 2018, Santa Fe, NM, USA.
Degaetano-Ortlieb, Stefania and Elke Teich. 2019. “Toward an optimal code for communication: The case of scientific English”. Corpus Linguistics and Linguistic Theory 2019 aop.
Fankhauser, Peter, Jörg Knappen and Elke Teich. 2014. “Exploring and Visualizing Variation in Language Resources”. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). Reykjavik, Iceland: European Language Resources Association (ELRA).
Gellerstam, Martin. 1986. “Translationese in Swedish novels translated from English”. In: Translation Studies in Scandinavia: Proceedings from the Scandinavian Symposium on Translation Theory (SSOTT). Ed. by Lars Wollin and Hans Lindquist. Lund, Sweden: CWK Gleerup, pp. 88–95.
Hughes, James M., Nicholas J. Foti, David C. Krakauer and Daniel N. Rockmore. 2012. “Quantitative patterns of stylistic influence in the evolution of literature”. Proceedings of the National Academy of Sciences 109(20). 7682–7686.
Jurafsky, D. and Martin, J. H. 2008. Speech and Language Processing: An introduction to speech recognition, computational linguistics and natural language processing. Upper Saddle River, NJ: Prentice Hall.
Kajzer-Wietrzny, Marta. 2012. “Interpreting universals and interpreting style”. PhD thesis. Adam Mickiewicz University, Poznań, Poland.
Karakanta, Alina, Mihaela Vela and Elke Teich. 2018. “Preserving Metadata from Parliamentary Debates”. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA).
Klingenstein, Sara, Tim Hitchcock and Simon De Deo. 2014. “The civilizing process in London’s Old Bailey”. Proceedings of the National Academy of Sciences 111(26). 9419–9424.
Koehn, Philipp. 2005. “Europarl: a parallel corpus for statistical machine translation”. In: Proceedings of the Tenth Machine Translation Summit. Phuket, Thailand: Asia-Pacific Association for Machine Translation, pp. 79–86.
Koppel, Moshe and Noam Ordan. 2011. “Translationese and its dialects”, In: Proceedings of Conference of the Association for Computational Linguistics (ACL), Portland, Oregon, pp. 1318–1326.
Lapshinova-Koltunski, Ekaterina and Marcos Zampieri. 2018. “Linguistic features of genre and method variation in translation: a computational perspective”. In: The Grammar of Genres and Styles: From Discrete to Non-Discrete Units. Ed. by Dominique Legallois, Thierry Charnois, and Meri Larjavaara. Berlin, Boston: De Gruyter Mouton, pp. 92–117.
Monti, Cristina, Claudio Bendazzoli, Annalisa Sandrelli and Mariachiara Russo. 2005. “Studying Directionality in Simultaneous Interpreting through an Electronic Corpus: EPIC (European Parliament Interpreting Corpus. Meta, 50 (4).
Östling, Robert and Jörg Tiedemann. 2017. “Continuous multilinguality with language vectors”. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Valencia, Spain: Association for Computational Linguistics, pp. 644–649.
Rabinovich, Ella and Shuly Wintner. 2015. “Unsupervised Identification of Translationese”. In: Transactions of the Association for Computational Linguistics 3: 419–432.
Rubino, Raphael, Ekaterina Lapshinova-Koltunski and Josef van Genabith. 2016. “Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification”. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL). Association for Computational Linguistics. ACL.
Sandrelli, Annalisa and Claudio Bendazzoli. 2005. “Lexical Patterns in Simultaneous Interpreting: A Preliminary Investigation of EPIC (European Parliament Interpreting Corpus)”. In: Proceedings from the Corpus Linguistics Conference Series 1. Birmingham, UK: University of Birmingham.
Teich, Elke. 2003. Cross-linguistic Variation in System and Text: A Methodology for the Investigation of Translations and Comparable Texts. Mouton de Gruyter.
Teich, Elke, José Martínez Martínez and Alina Karakanta (2020). “Translation, information theory and cognition”. In: Routledge Handbook of Translation and Cognition. Ed. by Fabio Alves and Arnt Lykke Jakobson. London: Routledge, pp. 360–375.
Zou, Will Y., Richard Socher, Daniel Cer and Christopher D. Manning. 2013. “Bilingual Word Embeddings for Phrase-Based Machine Translation”. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, Washington, USA: Association for Computational Linguistics, pp. 1393–1398.
Cited by (1)
Cited by one other publication
Shi, Yaqian & Lei Lei
2024.
Structural Complexity in Adapted Reading Materials: A Study Based on the Amount of Information.
Reading Research Quarterly 59:3
► pp. 371 ff.
This list is based on CrossRef data as of 4 august 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.