Edited by Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor and Violeta Seretan
[Current Issues in Linguistic Theory 341] 2018
► pp. 101–124
Multilingual Information Extraction requires significant Multiword Expressions (MWE) processing as many such items are multiwords. The lexical representation of MWEs supports large bilingual lexicons (for Persian, Pashto, Turkish, Arabic); multiwords are represented like single words, extended by two annotations: MWE head, and lemma plus part of speech for the MWE parts. In text analysis, MWEs are recognised as part of the parsing process, mot as pre- or post-processing components. The analysis design extends the X-bar scheme by a level for multiword rules. In transfer, MWEs are translated as elementary nodes like single word lemmata, to present key concepts for relevance judgement in Information Extraction. Evaluation shows that 90% of the MWE patterns in the lexicon can be analysed with about 150 MWE-specific rules, and that more than 90% of text document tokens are covered by the proposed integrated single and multiword processing.