Compositional translation of single-word complex terms using multilingual splitting

Clouet, Elizaveta; Harastani, Rima; Daille, Béatrice; Morin, Emmanuel

doi:10.1075/term.21.2.06clo

Article published In:

Terminology across Languages and Domains
Edited by Patrick Drouin, Natalia Grabar, Thierry Hamon and Kyo Kageura
[Terminology 21:2] 2015
► pp. 263–291

Compositional translation of single-word complex terms using multilingual splitting

Elizaveta Clouet

Rima Harastani

Béatrice Daille

Emmanuel Morin

Multilingual terminology acquisition from comparable corpora has been attracting the interest of researchers for twenty years, but challenges still remain. Bilingual term alignment, a subtask of multilingual terminology acquisition, requires a pre-processing step, because term structure may differ according to the language. Morphologically constructed terms should be segmented in order to be aligned with their equivalents in other languages. This article addresses the translation of complex terms using a compositional approach. We focus on the pre-processing of such terms and introduce a domain-oriented splitting method that we apply to compound terms belonging to two domains and four languages. The segmentations are used as input to a translation step. We evaluate which percentage of segmentations can be correctly translated by a compositional approach, and which splitting strategy (precision or recall-oriented) performs better. The results are compared to those obtained with the reference segmentations and with a corpus-base splitting method. Our method is close to the reference segmentation and outperforms the corpus-based method.

Keywords: bilingual terminology alignment, comparable corpora, compositional translation, compound splitting, complex terms

Published online: 31 December 2015

https://doi.org/10.1075/term.21.2.06clo

References (36)

Ahmad, Khurshid, Andrea Davies, Heather Fulford, and Margaret Rogers. 1992. “What Is a Term? The Semi-Automatic Extraction of Terms from Text.” In Translation Studies: An Interdiscipline, ed. by Mary Snell-Hornby, Franz Pöchhacker and Klaus Kaindl, 267–278. Amsterdam: John Benjamins.

Amiot, Dany, and Georgette Dal. 2008. “La composition néoclassique en français et l’ordre des constituants [Neoclassical Compounding in French and Constituents Order].” La composition dans les langues [ Compounding in the Languages ], 89–113. Artois Presses Université.

Baldwin, Timothy, and Takaaki Tanaka. 2004. “Translation by Machine of Complex Nominals: Getting it Right.” In Proceedings of the ACL (Association for Computational Linguistics) 2004 Workshop on Multiword Expressions: Integrating Processing , 24–31, Barcelona, Spain.

Bauer, Laurie. 2010. “The Typology of Exocentric Compounding.” In Сross-Disciplinary Issues in Compounding, ed. by Sergio Scalise and Irene Vogel, 167–175. Amsterdam: John Benjamins.

Béchade, Hervé-D. 1992. Phonétique et morphologie du français moderne et contemporain [ Phonetics and Morphology of Modern and Contemporary French ]. Presses Universitaires de France.

Benveniste, Emile. 1974. Problèmes de linguistique générale [General Linguistics Problems]. Paris: Gallimard.

Braschler, Martin, and Bärbel Ripplinger. 2004. “How Effective Is Stemming and Decompounding for German Text Retrieval.” Information Retrieval 7 (3-4): 291–316.

Cabré Castellví, M. Teresa. 1999. Terminology: Theory, Methods and Applications. Amsterdam: John Benjamins.

Chen, Aitao, and Fredric Gey. 2001. “Translation Term Weighting and Combining Translation Resources in Cross-Language Retrieval.” In Proceedings of TREC (Text Retrieval Conference) 2001 , 529–534, Gaithersburg, Maryland, USA.

CompoST. 2015. Compound Splitting Tool. ([URL]). Last access September 13, 2015.

Cusin-Berche, Fabienne. 2003. Les mots et leurs contextes. [The words and their contexts]. Paris: Presses Sorbonne Nouvelle.

Delpech, Estelle, Béatrice Daille, Emmanuel Morin, and Claire Lemaire. 2012. “Extraction of Domain-Specific Bilingual Lexicon from Comparable Corpora: Compositional Translation and Ranking.” In Proceedings of COLING (Conference on Computational Linguistics) 2012 , 745–762, Mumbai, India.

Dyer, Chris. 2009. “Using a Maximum Entropy Model to Build Segmentation Lattices for MT”. In Proceedings of HLT-NAACL (Human Language Technologies: The 11th Annual Conference of the North American Chapter of the ACL) 2009 , 406–414, Los Angles, CA, USA.

Frunza, Oana, and Diana Inkpen. 2009. “Identification and Disambiguation of Cognates, False Friends, and Partial Cognates Using Machine Learning Techniques.” International Journal of Linguistics 1 (1): 1–37.

Gelbukh, Alexander, and Grigori Sidorov. 2003. “Approach to Construction of Automatic Morphological Analysis Systems for Inflective Languages with Little Effort.” In Proceedings of CICLing (Computational Linguistics and Intelligent Text Processing) 2003 , 215–220, Mexico City, Mexico.

Grefenstette, Gregory. 1999. “The World Wide Web as a Resource for Example-Based Machine Translation Tasks”. In Translating and the Computer 21. London: ASLIB.

De Groc, Clément. 2011. “Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction.” In Proceedings of The IEEE/WIC/ACM International Conferences on Web Intelligence , 497–498, Lyon, France.

Koehn, Philipp, and Kevin Knight. 2003. “Empirical Methods for Compound Splitting.” In Proceedings of EACL (European chapter of the Association for Computational Linguistics) 2003 , 187–193, Budapest, Hungary.

Larson, Martha, Daniel Willett, Joachim Köhler, and Gerhard Rigoll. 2000. “Compound Splitting and Lexical Unit Recombination for Improved Performance of a Speech Recognition System for German Parliamentary Speeches.” In Proceedings of the 6th International Conference on Spoken Language Processing , 945–948, Beijing, China.

Macherey, Klaus, Andrew M. Dai, David Talbot, Ashok C. Popat, and Franz Och. 2011. “Language-Independent Compound Splitting with Morphological Operations.” In Proceedings of ACL (Association for Computational Linguistics) 2011 , 1395–1404, Portland, Oregon.

Morin, Emmanuel, Béatrice Daille, Koichi Takeuchi, and Kyo Kageura. 2007. “Bilingual Terminology Mining - Using Brain, not Brawn Comparable Corpora.” In Proceedings of ACL (Association for Computational Linguistics) 2007 , 664–671, Prague, Czech Republic.

Namer, Fiammetta. 2003. “Automatiser l’analyse morpho-sémantique non affixale: le système DériF [To Automate Non-Affixational Morphosemantic Analysis: System DériF].” Cahiers de grammaire 281: 31–48.

. 2009. Morphologie, lexique et traitement automatique des langues [Morphology, Lexicon and Natural Language Processing]. London: Hermès Sciences Publishing.

Ott, Niels. 2005. “Measuring Semantic Relatedness of German Compounds Using GermaNet”. ([URL]). Last access August 4, 2014.

Pirrelli, Vito, Emiliano Guevara, and Marco Baroni. 2010. “Computational Issues in Compound Processing.” In Cross-Disciplinary Issues in Compounding, ed. by Sergio Scalise and Irene Vogel, 271–285. Amsterdam: John Benjamins.

Rapp, Reinhard. 1999. “Automatic Identification of Word Translation from Unrelated English and German Corpora”. In Proceedings of ACL (Association for Computational Linguistics) 1999 , 519–526, Maryland, USA.

Robitaille, Xavier, Yasuhiro Sasaki, Masatsugu Tonoike, Satoshi Sato, and Takchito Utsuro. 2006. “Compiling French-Japanese Terminologies from the Web.” In Proceedings of EACL (European chapter of the Association for Computational Linguistics) 2006 , 225–232, Trento, Italy.

Scalise, Sergio, and Antonio Fabregas. 2010. “The Head in Compounding.” In Cross-Disciplinary Issues in Compounding, ed. by Sergio Scalise and Irene Vogel, 109–125. Amsterdam: John Benjamins.

Schmid, Helmut, Arne Fitschen, and Ulrich Heid. 2004. “SMOR: A German Computational Morphology Covering Derivation, Composition, and Inflection.” In Proceedings of LREC 2004 , 1263–1266, Lisbon, Portugal.

Stymne, Sara, Nicola Cancedda, and Lars Ahrenberg. 2013. “Generation of Compound Words in Statistical Machine Translation into Compounding Languages.” Computational Linguistics 39(4): 1067–1108.

Vintar, Spela. 2010. “Bilingual Term Recognition Revisited the Bag-of-Equivalents Term Alignment Approach and Its Evaluation.” Terminology 161: 141–158.

Virpioja, Sami, Oskar Kohonen, and Krista Lagus. 2009. “Unsupervised Morpheme Analysis with Allomorfessor.” In Proceedings of CLEF (Cross-Language Evaluation Forum) 2009 , 609–616, Corfu, Greece.

Virpioja, Sami, Peter Smit, Stig-Arne Grönroos, and Mikko Kurimo. 2013. “Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline. Technical Report.” Aalto University Publication Series SCIENCE + TECHNOLOGY. Helsinki: Aalto University.

Weller, Marion, and Ulrich Heid. 2012. “Analyzing and Aligning German Compound Nouns.” In Proceedings of LREC 2012 , Istanbul, Turkey.

Weller, Marion, Fabienne Cap, Stefan Müller, Sabine Schulte im Walde, and Alexander Fraser. 2014. “Distinguishing Degrees of Compositionality in Compound Splitting for Statistical Machine Translation.” In Proceedings of ComAComA 2014 , 81–90, Dublin, Ireland.

Zweigenbaum, Pierre, Robert Baud, Anita Burgun, Fiammetta Namer, Éric Jarrousse, Natalia Grabar, Patrick Ruch, Franck Le Duff, Jean-François Forget, Magaly Douyère, and Stéfan Darmoni. 2005. “UMLF: A Unified Medical Lexicon for French.” International Journal of Medical Informatics 74 (2-4): 119–124.