Towards automation of routine procedures: Compiling parallel text corpora

Mihailov, Mihail; Tommola, Hannu

doi:10.1075/bct.8.07mih

Part of

Text Corpora and Multilingual Lexicography
Edited by Wolfgang Teubert
[Benjamins Current Topics 8] 2007
► pp. 59–67

Compiling parallel text corpora

Towards automation of routine procedures

Mihail Mihailov

Hannu Tommola

The aim of the research project running at the Department of Translation Studies of the University of Tampere is to collect a Russian-Finnish parallel corpus of fiction. The corpus will be equipped with efficient search and analysis tools. The texts of the corpus will be stored as ordinary text files. Each text will be registered in a Microsoft Access database and supplied with a description. Automated parallel concordancing is being developed for the corpus. The program will find the keywords in text A (Russian), then look for possible translation equivalents of the keywords in language B (Finnish), and then search for the portion of text B (Finnish) where most of the keywords in question can be found.

Published online: 27 June 2007

https://doi.org/10.1075/bct.8.07mih