A parallel corpus of 40 languages: InterCorp

Čermák, Petr

doi:10.1075/scl.90.06cer

Part of

Parallel Corpora for Contrastive and Translation Studies: New resources and applications
Edited by Irene Doval and M. Teresa Sánchez Nieto
[Studies in Corpus Linguistics 90] 2019
► pp. 93–101

InterCorp

A parallel corpus of 40 languages

Petr Čermák | Charles University Prague

This chapter presents the current version of InterCorp, a parallel corpus created at the Faculty of Arts, Charles University in Prague. The corpus contains texts in Czech aligned with one or more foreign-language version(s), including Czech and 39 other languages. The chapter analyses its structure and technical parameters, and describes some technical tools used with the corpus (Kontext, a corpus query interface, and InterText, a parallel text alignment editor created specifically for the project). Similarly, the contribution discusses Treq (Translation Equivalents Database), a collection of bilingual Czech-foreign language dictionaries built automatically from InterCorp. In the last section of the chapter, the possibilities for methodological and linguistic exploitation of the corpus are discussed.

Keywords: parallel corpus, InterCorp, comparison of languages, Spanish, Czech National Corpus

Article outline

1.Introduction
2.Description of the corpus
- 2.1The Spanish part of the corpus
3.Using the corpus
4.Specific tools: Translation equivalents database
5.Exploiting InterCorp
6.Conclusion
Acknowledgment
References

Published online: 20 March 2019

https://doi.org/10.1075/scl.90.06cer

References (16)

References

Čermák, František, Corness, Patrick & Klégr, Aleš (eds). 2010. InterCorp: Exploring a Multilingual Corpus. Prague: Nakladatelství Lidové Noviny & Ústav Českého národního korpusu.

Čermák, František & Rosen, Alexandr. 2012. The case of InterCorp, a multilingual parallel corpus. International Journal of Corpus Linguistics 17(3): 411–427.

Čermák, Petr. 2007. Acerca de los corpora paralelos: El proyecto Intercorp (About the parallel corpora: The Intercorp project). Verba 34: 375–380.

Machálek, Tomáš. 2016. Kontext. <[URL]> (18 November 2017).

Nádvorníková, Olga. 2017. Pièges méthodologiques des corpus parallèles et comment les éviter (Methodological traps of parallel corpora and how to avoid them). Corela. Cognition, Représentation, Langage HS-21: 1–28.

Och, Franz Josef & Ney, Hermann. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29(1): 19–51.

Repository of bibliographical items based on the Czech National Corpus. 2017. <[URL]> (18 November 2017).

Rosen, Alexander & Vavřín, Martin. 2016. Korpus InterCorp, version 9 of 9 Sep 2016. Institute of the Czech National Corpus, Charles University, Prague 2014. <[URL]> (18 November 2017).

Rosen, Alexandr & Vavřín, Martin. 2012. Building a multilingual parallel corpus for human users. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), Nicoletta Calzolari et al. (eds), 2447–2452. Turkey: European Language Resources Association (ELRA).

Meurer, Paul. 2012. INESS-Search: A search system for LFG (and other) treebanks. In Proceedings of LFG12 Conference, Miriam, Butt & Tracy, H. King (eds). Stanford, CA: CSLI Publications).

Rosen, Alexandr. 2016. InterCorp – a look behind the façade of a parallel corpus. In Polskojęzyczne korpusy równoległe. Polish-language Parallel Corpora, Ewa Gruszczyńska & Agnieszka Leńko-Szymańska (eds.), 21–40. Warszawa: Instytut Lingwistyki Stosowanej.

Škrabal, Michal & Vavřín, Martin. 2017. The Translation Equivalents Database (Treq) as a Lexicographer’s Aid. In Electronic lexicography in the 21st century. Proceedings of eLex 2017 conference, Kosek Iztok et alii (eds.), 124–137. Leiden: Lexical Computing CZ s. r. o.

Štichauer, Pavel & Čermák, Petr. 2016. Causative constructions of the hacer / fare + verb type in Spanish and Italian and their Czech counterparts: A parallel corpus-based study. Linguistica Pragensia 26(2): 7–20.

TreeTagger. 2017. <[URL]> (18 November 2017).

Vondřička, Pavel. 2014. Aligning parallel texts with InterText. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Nicoletta Calzolari et al. (eds), 1875–1879. Reykjavik: European Language Resources Association (ELRA).

. 2016. Intertext, Parallel Text Alignment Editor. <[URL]> (18 November 2017).

Cited by (2)

Cited by two other publications

Mikhailov, Mikhail

2021. Mind the Source Data! Translation Equivalents and Translation Stimuli from Parallel Corpora. In New Perspectives on Corpus Translation Studies [New Frontiers in Translation Studies, ], ► pp. 259 ff.

DOVAL, Irene

2018. Corpus paralelos en la enseñanza de lenguas extranjeras: un ejemplo de aplicación basado en el corpus PaGeS. CLINA: Revista Interdisciplinaria de Traducción, Interpretación y Comunicación Intercultural 4:2 ► pp. 65 ff.

This list is based on CrossRef data as of 27 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.