The IJS-ELAN Slovene-English Parallel Corpus
The paper presents an annotated parallel Slovene-English corpus developed in the scope of the EU ELAN project. The IJS-ELAN corpus was compiled to be a widely distributable dataset for language engineering and for translation and terminology studies. The corpus contains 1 million words from fifteen recent terminology-rich texts. The corpus is sentence aligned and word-tagged with context disambiguated morphosyntactic descriptions and lemmas. These descriptions model simple feature structures, the structure of which is shared between Slovene and English. The corpus is encoded according to the Guidelines for Text Encoding and Interchange and is freely available on the Web for downloading. Additionally, access to IJS-ELAN is available via a powerful Web concordancer.
Keywords: parallel corpus, corpus encoding, tagging, concordancing
Published online: 18 October 2002
Cited by other publications
Dias, Gaël & Špela Vintar
ERJAVEC, TOMAŽ & SASČO DŽEROSKI
Žganec-Gros, Jerneja & Stanislav Gruden
Žganec-Gros, Jerneja, France Mihelič, Tomaž Erjavec & Špela Vintar
This list is based on CrossRef data as of 10 january 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.