Compiling Parallel Text Corpora: Towards Automation of Routine Procedures

Mihailov, Mihail; Tommola, Hannu

doi:10.1075/ijcl.6.si.07mih

Article published In:

Text Corpora and Multilingual Lexicography
Wolfgang Teubert
[International Journal of Corpus Linguistics 6:SI] 2001
► pp. 67–77

Compiling Parallel Text Corpora: Towards Automation of Routine Procedures

Mihail Mihailov

Hannu Tommola

The aim of the research project running at the Department of Translation Studies of the University of Tampere is to collect a Russian-Finnish parallel corpus of fiction. The corpus will be equipped with efficient search and analysis tools. The texts of the corpus will be stored as ordinary text files. Each text will be registered in a Microsoft Access database and supplied with a description. Automated parallel concordancing is being developed for the corpus. The program will find the keywords in text A (Russian), then look for possible translation equivalents of the keywords in language B (Finnish), and then search for the portion of text B (Finnish) where most of the keywords in question can be found.

Keywords: database, translation, Finnish, Russian, parallel concordance, keyword

Published online: 17 December 2001

https://doi.org/10.1075/ijcl.6.si.07mih