Article published in:Terminology across Languages and Domains
Edited by Patrick Drouin, Natalia Grabar, Thierry Hamon and Kyo Kageura
[Terminology 21:2] 2015
► pp. 263–291
Compositional translation of single-word complex terms using multilingual splitting
Multilingual terminology acquisition from comparable corpora has been attracting the interest of researchers for twenty years, but challenges still remain. Bilingual term alignment, a subtask of multilingual terminology acquisition, requires a pre-processing step, because term structure may differ according to the language. Morphologically constructed terms should be segmented in order to be aligned with their equivalents in other languages. This article addresses the translation of complex terms using a compositional approach. We focus on the pre-processing of such terms and introduce a domain-oriented splitting method that we apply to compound terms belonging to two domains and four languages. The segmentations are used as input to a translation step. We evaluate which percentage of segmentations can be correctly translated by a compositional approach, and which splitting strategy (precision or recall-oriented) performs better. The results are compared to those obtained with the reference segmentations and with a corpus-base splitting method. Our method is close to the reference segmentation and outperforms the corpus-based method.
Keywords: bilingual terminology alignment, comparable corpora, compositional translation, compound splitting, complex terms
Published online: 31 December 2015
Ahmad, Khurshid, Andrea Davies, Heather Fulford, and Margaret Rogers
Amiot, Dany, and Georgette Dal
2008 “La composition néoclassique en français et l’ordre des constituants [Neoclassical Compounding in French and Constituents Order].” La composition dans les langues [ Compounding in the Languages ], 89–113. Artois Presses Université.
Baldwin, Timothy, and Takaaki Tanaka
2004 “Translation by Machine of Complex Nominals: Getting it Right.” In Proceedings of the ACL (Association for Computational Linguistics) 2004 Workshop on Multiword Expressions: Integrating Processing , 24–31, Barcelona, Spain.
Braschler, Martin, and Bärbel Ripplinger
Cabré Castellví, M. Teresa
Chen, Aitao, and Fredric Gey
2001 “Translation Term Weighting and Combining Translation Resources in Cross-Language Retrieval.” In Proceedings of TREC (Text Retrieval Conference) 2001 , 529–534, Gaithersburg, Maryland, USA.
2015 Compound Splitting Tool. (https://logiciels.lina.univ-nantes.fr/redmine/projects/compost). Last access September 13, 2015.
Delpech, Estelle, Béatrice Daille, Emmanuel Morin, and Claire Lemaire
2012 “Extraction of Domain-Specific Bilingual Lexicon from Comparable Corpora: Compositional Translation and Ranking.” In Proceedings of COLING (Conference on Computational Linguistics) 2012 , 745–762, Mumbai, India.
2009 “Using a Maximum Entropy Model to Build Segmentation Lattices for MT”. In Proceedings of HLT-NAACL (Human Language Technologies: The 11th Annual Conference of the North American Chapter of the ACL) 2009 , 406–414, Los Angles, CA, USA.
Frunza, Oana, and Diana Inkpen
Gelbukh, Alexander, and Grigori Sidorov
2003 “Approach to Construction of Automatic Morphological Analysis Systems for Inflective Languages with Little Effort.” In Proceedings of CICLing (Computational Linguistics and Intelligent Text Processing) 2003 , 215–220, Mexico City, Mexico.
De Groc, Clément
2011 “Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction.” In Proceedings of The IEEE/WIC/ACM International Conferences on Web Intelligence , 497–498, Lyon, France.
Koehn, Philipp, and Kevin Knight
2003 “Empirical Methods for Compound Splitting.” In Proceedings of EACL (European chapter of the Association for Computational Linguistics) 2003 , 187–193, Budapest, Hungary.
Larson, Martha, Daniel Willett, Joachim Köhler, and Gerhard Rigoll
2000 “Compound Splitting and Lexical Unit Recombination for Improved Performance of a Speech Recognition System for German Parliamentary Speeches.” In Proceedings of the 6th International Conference on Spoken Language Processing , 945–948, Beijing, China.
Macherey, Klaus, Andrew M. Dai, David Talbot, Ashok C. Popat, and Franz Och
2011 “Language-Independent Compound Splitting with Morphological Operations.” In Proceedings of ACL (Association for Computational Linguistics) 2011 , 1395–1404, Portland, Oregon.
Morin, Emmanuel, Béatrice Daille, Koichi Takeuchi, and Kyo Kageura
2007 “Bilingual Terminology Mining - Using Brain, not Brawn Comparable Corpora.” In Proceedings of ACL (Association for Computational Linguistics) 2007 , 664–671, Prague, Czech Republic.
2005 “Measuring Semantic Relatedness of German Compounds Using GermaNet”. (http://niels.drni.de/n3files/bananasplit/Compound-GermaNet-Slides.pdf). Last access August 4, 2014.
Pirrelli, Vito, Emiliano Guevara, and Marco Baroni
1999 “Automatic Identification of Word Translation from Unrelated English and German Corpora”. In Proceedings of ACL (Association for Computational Linguistics) 1999 , 519–526, Maryland, USA.
Robitaille, Xavier, Yasuhiro Sasaki, Masatsugu Tonoike, Satoshi Sato, and Takchito Utsuro
2006 “Compiling French-Japanese Terminologies from the Web.” In Proceedings of EACL (European chapter of the Association for Computational Linguistics) 2006 , 225–232, Trento, Italy.
Scalise, Sergio, and Antonio Fabregas
Schmid, Helmut, Arne Fitschen, and Ulrich Heid
2004 “SMOR: A German Computational Morphology Covering Derivation, Composition, and Inflection.” In Proceedings of LREC 2004 , 1263–1266, Lisbon, Portugal.
Stymne, Sara, Nicola Cancedda, and Lars Ahrenberg
Virpioja, Sami, Oskar Kohonen, and Krista Lagus
2009 “Unsupervised Morpheme Analysis with Allomorfessor.” In Proceedings of CLEF (Cross-Language Evaluation Forum) 2009 , 609–616, Corfu, Greece.
Virpioja, Sami, Peter Smit, Stig-Arne Grönroos, and Mikko Kurimo
Weller, Marion, and Ulrich Heid
2012 “Analyzing and Aligning German Compound Nouns.” In Proceedings of LREC 2012 , Istanbul, Turkey.
Weller, Marion, Fabienne Cap, Stefan Müller, Sabine Schulte im Walde, and Alexander Fraser
2014 “Distinguishing Degrees of Compositionality in Compound Splitting for Statistical Machine Translation.” In Proceedings of ComAComA 2014 , 81–90, Dublin, Ireland.