Chapter published in:Observing Eurolects: Corpus analysis of linguistic variation in EU law
Edited by Laura Mori
[Studies in Corpus Linguistics 86] 2018
► pp. 28–45
The Eurolect Observatory Multilingual Corpus
Construction and query tools
This chapter aims to explain the corpus design of the Eurolect Observatory Multilingual Corpus and the steps required to build all the different monolingual corpora the project needed to accomplish its research objectives. The first two paragraphs after the general introduction will point out the differences and the overlaps that characterize all the corpora that the author of this paper was in charge of producing as a member of the UNINT research team and that were used in the Eurolect Observatory Project for text mining. After accurately defining the data collection and corpus building strategies adopted, this paper will describe the corpus search tool that was developed in order to help scholars look for and save samples of text from the whole corpus in a convenient and easy way.
Keywords: natural language processing, corpus linguistics, AWK, corpus search tool, regular expressions, markup
Published online: 06 December 2018
Barbera, E., Corino, E., & Onesti, C.
Burnage, G., & Dunlop, D.
(1992) Encoding the British National Corpus. In J. Aarts, P. de Haan, & N. Oostdijk (Eds.), English language corpora: Design, analysis and exploitation. Papers from the Thirteenth International Conference on English Language Research on Computerized Corpora, Nijmegen 1992 (pp. 79–95). Amsterdam: Rodopi.
Kenning, M. -M.
Lenci, A., Montemagni, S., & Pirrelli, V.
Schmitt, L. M., Christianson, K., & Gupta, R.
Cited by 5 other publications
Mori, Laura & Benedikt Szmrecsanyi
Portelli, Sergio & Sandro Caruana
Sosonis, Vilelmini, Katia Lida Kermanidis & Sotirios Livas
This list is based on CrossRef data as of 20 october 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.