Parallel corpora make sense
Bypassing the knowledge acquisition bottleneck for Word Sense Disambiguation
We present a multilingual approach to Word Sense Disambiguation (WSD), which automatically assigns the contextually appropriate sense to a given word. Instead of using a predefined monolingual sense-inventory, we use a language-independent framework by deriving the senses of a given word from word alignments on a multilingual parallel corpus, which we made available for corpus linguistics research. We built five WSD systems with English as the input language and translations in five supported languages (viz. French, Dutch, Italian, Spanish and German) as senses. The systems incorporate both binary translation features and local context features. The experimental results are very competitive, which confirms our initial hypothesis that each language contributes to the disambiguation of polysemous words. Because our system extracts all information from the parallel corpus, it offers a flexible language-independent approach, which implicitly deals with the sense distinctness issue and allows us to bypass the knowledge acquisition bottleneck for WSD.
Cited by 1 other publications
This list is based on CrossRef data as of 15 april 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.