Indexation and analysis of a parallel corpus using CQPweb
Ulrike Oster | Universitat Politècnica de València, Universitat Jaume I
This contribution presents a section of the Corpus Valencià de Literatura Traduïda (COVALT), created by the research group of the same name (Department of Translation and Communication, Universitat Jaume I, Spain). The COVALT corpus is a four-million word corpus made up of narrative works originally written in English, French, and German and their Catalan translations published in the autonomous community of Valencia between 1990 and 2000. Since the members of the Covalt group are interested in translation research, and more specifically in the investigation of translated Catalan and Spanish, this corpus has recently been extended to include translations into Spanish published in Spain (COVALT PAR_ES corpus). This chapter presents the COVALT PAR_ES corpus, as well as its process of compilation and analysis with CQPweb.
Article outline
- 1.Introduction
- 2.The corpora
- 3.Corpus compilation and indexation
- 3.1Preparation of texts
- 3.2Uploading the files to CQPweb
- Step 1: Creating directories
- Step 2: Encoding and indexing corpora in CWB
- Step 3: Aligning the subcorpora
- Step 4: Copying the files to CQPweb
- Step 5: Activating the corpora on the web interface
- 4.Corpus analysis
- 5.Conclusion
