A Description of the English-Norwegian Parallel Corpus: Compilation and Further Developments

Oksefjell, Signe

doi:10.1075/ijcl.4.2.01oks

Article published In:

International Journal of Corpus Linguistics
Vol. 4:2 (1999) ► pp.197–219

A Description of the English-Norwegian Parallel Corpus

Compilation and Further Developments

Signe Oksefjell | Department of British and American Studies, University of Oslo

This paper gives an introduction to the most important steps in the process of compiling the English-Norwegian Parallel Corpus (ENPC), which contains 50 original English text extracts with their translations into Norwegian and 50 original Norwegian text extracts with their translations into English, in all about 2.6 million words. Even if the most time-consuming part of the process is to prepare the text extracts for the corpus, much of the focus has also been on the development of software, notably a browser handling parallel texts and an alignment program linking the original and translated versions of the same text. The preparation of the texts themselves includes scanning, proofreading, mark-up, and alignment. Although the ENPC is completed, the ENPC project is still developing, and the most recent extensions will be mentioned in this paper, such as adding more languages, compiling multiple translations (in the same language) of the same text, part-of-speech-tagging, and marking direct speech and thought in the ENPC.

Keywords: Parallel Corpus, Alignment, Tagging, Mark-up, Browser for Parallel Texts

Published online: 12 May 2000

https://doi.org/10.1075/ijcl.4.2.01oks

Cited by (4)

Cited by 4 other publications

Order by:

Andaluz-Pinedo, Olaia & Hugo Sanjurjo-González

2022. Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre. Language Resources and Evaluation 56:2 ► pp. 651 ff.

Molés-Cases, Teresa & Ulrike Oster

2019. Indexation and analysis of a parallel corpus using CQPweb. In Parallel Corpora for Contrastive and Translation Studies [Studies in Corpus Linguistics, 90], ► pp. 197 ff.

EunJooLee

2008. An analysis of corpus-based research on TEFL and applied linguistics.. English Teaching 63:2 ► pp. 283 ff.

Véronis, Jean

2000. From the Rosetta stone to the information society. In Parallel Text Processing [Text, Speech and Language Technology, 13], ► pp. 1 ff.

This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.