Chapter published in:
Multiword Units in Machine Translation and Translation TechnologyEdited by Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor and Violeta Seretan
[Current Issues in Linguistic Theory 341] 2018
► pp. 126–145
A multilingual gold standard for translation spotting of German compounds and their corresponding multiword units in English, French, Italian and Spanish
Simon Clematide | Institute of Computational Linguistics, University of Zurich
Stéphanie Lehner | Institute of Computational Linguistics, University of Zurich
Johannes Graën | Institute of Computational Linguistics, University of Zurich
Martin Volk | Institute of Computational Linguistics, University of Zurich
This article describes a new word alignment gold standard for German nominal compounds and their multiword translation
equivalents in English, French, Italian, and Spanish. The gold standard contains alignments for each of the ten
language pairs, resulting in a total of 8,229 bidirectional alignments. It covers 362 occurrences of 137 different
German compounds randomly selected from the corpus of European Parliament plenary sessions, sampled according to the
criteria of frequency and morphological complexity. The standard serves for the evaluation and optimisation of
automatic word alignments in the context of spotting translations of German compounds. The study also shows that in
this text genre, around 80% of German noun types are morphological compounds indicating potential multiword units in
their parallel equivalents.
Keywords: gold standard, word alignment, compounding, multilinguality, German, English, Spanish, Italian, French
Published online: 20 July 2018
https://doi.org/10.1075/cilt.341.06cle
https://doi.org/10.1075/cilt.341.06cle
References
References
Baroni, M., Matiasek, J., & Trost, H.
Deng, D., & Xue, N.
(2014) Building a hierarchically aligned Chinese-English parallel treebank. In Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical
papers (pp.1511–1520). Dublin, Ireland. Retrieved from http://www.aclweb.org/anthology/C14-1143
Parra Escartín, C.
Graça, J., Paulo Pardal, J., Coheur, L., & Caseiro, D.
Graën, J., Batinic, D., & Volk, M.
Haapalainen, M., & Majorin, A.
Holmqvist, M., & Ahrenberg, L.
Koehn, P.
(2005) Europarl: A Parallel Corpus for Statistical Machine Translation. In Proceedings of the MT Summit 2005 (pp.79–86). Retrieved from http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/europarl-mtsummit05.pdf
Lambert, P., De Gispert, A., Banchs, R., & Mariño, J. B.
Martin, J., Mihalcea, R., & Pedersen, T.
Och, F. J., & Ney, H.
Parra Escartín, C., & Héctor Martínez, A.
(2014, March. Compound Dictionary Extraction and WordNet. A Dangerous Liaison. Retrieved from http://typo.uni-konstanz.de/parseme/images/Meeting/2014-03-11-Athens-meeting/PostersA4/WG3-Parra_Martinez-posterA4.pdf
Petrov, S., Das, D., & McDonald, R.
Roth, T.
Schmid, H.
Simões, A., & Fernandes, S.
(2011) XML schemas for parallel corpora. In Xata 2011 – 9a conferência nacional em xml, aplicações e tecnologias associadas, vila do conde,
portugal (pp.59–69).
Tiedemann, J.
Tinsley, J., Hearne, M., & Way, A.
Varga, D., Halácsy, P., Kornai, A., Nagy, V., Németh, L., & Trón, V.
Volk, M., Göhring, A., Marek, T., & Samuelsson, Y.
(2010) SMULTRON (version 3.0) – The Stockholm MULtilingual parallel TReebank. electronic. Retrieved from http://www.cl.uzh.ch/research/parallelcorpora/paralleltreebanks.html (An English-French-German-Spanish-Swedish parallel treebank with sub-sentential alignments)
Véronis, J., & Langlais, P.