Chapter published in:
Parallel Corpora for Contrastive and Translation Studies: New resources and applicationsEdited by Irene Doval and M. Teresa Sánchez Nieto
[Studies in Corpus Linguistics 90] 2019
► pp. 233–247
An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus
Zuriñe Sanz-Villar | University of the Basque Country (UPV/EHU)
Since the 1980s, considerable efforts have been made to create different types of Basque corpora. However, to systematically analyse the Basque translations of German literary texts, it was necessary to create a corpus from the ground up. Intermediary versions were included in this corpus whenever the Basque target text was not a translation from the German original but came instead from a translation into another language (Spanish in most cases). A tool called TAligner was used to align the bitexts and the tritexts. The aim of this chapter is, firstly, to provide the reader with an overview of the main Basque corpora. Secondly, I will describe the design and compilation process of a parallel and multilingual corpus using TAligner 3.0. Thirdly, I will present how the corpus has been lemmatized and annotated at the level of part-of-speech. Finally, the process of extracting potential Basque multi-word expressions will be shown.
Keywords: Basque corpora, Aleuska corpus, TAligner, Basque MWEs
Article outline
- 1.Introduction
- 2.An overview of Basque corpora
- 3.Design, compilation and annotation of the Aleuska corpus
- 4.Extraction of MWEs
- 5.Conclusion
-
Notes -
References
Published online: 20 March 2019
https://doi.org/10.1075/scl.90.14san
https://doi.org/10.1075/scl.90.14san
References
Altzibar, Xabier & Bilbao, Xabier & Garai, Koldo
Agerri, Rodrigo & Bermudez, Josu & Rigau, German
Areta, Nerea & Gurrutxaga, Anton & Leturia, Igor
Corpas Pastor, Gloria
Hulden, Mans
Ibarretxe Antuñano, Iraide & Martinez Lizarduikoa, Alfontso
Kenny, Dorothy
Serón Ordóñez, Inmaculada
Sinclair, John
2005 Corpus and text-basic principles. In Developing Linguistic Corpora: A Guide to Good Practice, Martin Wyne (ed). Oxford: University of Oxford–AHDS Literature, Languages and Linguistics. http://ota.ox.ac.uk/documents/creating/dlc/chapter1.htm (6 May 2017).
Urkia, Miriam
2010 Corpusgintzaren garrantzia hizkuntzalaritzan eta euskararen egoeran. http://www.euskaltzaindia.eus/dok/plazaberri/2010/urtarrila/corpusgintza_miriamurkia.pdf (6 May 2017).
Cited by
Cited by 2 other publications
Pérez Blanco, María & Marlén Izquierdo
Sanz, Zuriñe & Olaia Andaluz-Pinedo
This list is based on CrossRef data as of 01 april 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.