Edited by Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor and Violeta Seretan
[Current Issues in Linguistic Theory 341] 2018
► pp. 147–162
As compounds pose a problem for applications that rely on precise word alignments, we developed a state-of-the-art compound splitter for Dutch that makes use of corpus frequency information and linguistic knowledge. Domain-adaptation techniques are used to combine large out-of-domain and dynamically compiled in-domain frequency lists.
As compounds are not always translated compositionally, we developed a novel methodology for word alignment. We train the word alignment models twice: a first time on the original data set and a second time on the data set in which the compounds are split into their component parts.
Experiments show that the compound splitter combined with the novel word alignment technique considerably improves bilingual terminology extraction results.