Chapter published in:Computational Phraseology
Edited by Gloria Corpas Pastor and Jean-Pierre Colson
[IVITRA Research in Linguistics and Literature 24] 2020
► pp. 136–150
Multiword expressions in comparable corpora
On the basis of Aranea Gigaword Web corpora, a family of comparable corpora intended for use in contrastive linguistic research, multilingual lexicography, language teaching and translation studies we discuss the pros and cons of comparable corpora in contrast to monolingual and parallel corpora for the analysis of multiword entities (MWEs). We demonstrate that by using large corpora for two or more languages, consisting of unrelated texts, yet created in a comparable manner, parallel language structures and phenomena like MWEs can be identified if the appropriate tools are employed. With the Aranea corpora, the “bilingual sketch” functionality of the Sketch Engine is one such tool which provides a new approach for analyses of similarities of (or differences between) collocation profiles (word sketches) for words and their translation equivalents.
Keywords: comparable corpora, universal tagset, compatible Sketch Grammars, multiword expressions
Published online: 08 May 2020