This chapter aims to explain the corpus design of the Eurolect Observatory Multilingual Corpus and the steps required to build all the different monolingual corpora the project needed to accomplish its research objectives. The first two paragraphs after the general introduction will point out the differences and the overlaps that characterize all the corpora that the author of this paper was in charge of producing as a member of the UNINT research team and that were used in the Eurolect Observatory Project for text mining. After accurately defining the data collection and corpus building strategies adopted, this paper will describe the corpus search tool that was developed in order to help scholars look for and save samples of text from the whole corpus in a convenient and easy way.
Barbera, E., Corino, E., & Onesti, C. (2007). Cosa è un corpus? Per una definizione più rigorosa di corpus, token, markup. In E. Barbera, E. Corino, & C. Onesti (Eds.), Corpora e linguistica in rete (pp. 25–88). Perugia: Guerra Edizioni.
Burnage, G., & Dunlop, D. (1992). Encoding the British National Corpus. In J. Aarts, P. de Haan, & N. Oostdijk (Eds.), English language corpora: Design, analysis and exploitation. Papers from the Thirteenth International Conference on English Language Research on Computerized Corpora, Nijmegen 1992 (pp. 79–95). Amsterdam: Rodopi.
Gillam, R. (2003). Unicode demystified: A practical programmer’s guide to the encoding standard. Boston MA: Addison-Wesley.
Kenning, M. -M. (2010). What are parallel and comparable corpora and how can we use them? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 487–500). London: Routledge.
Lenci, A., Montemagni, S., & Pirrelli, V. (2016). Testo e computer. Elementi di linguistica computazionale. Roma: Carocci.
Mori, L. (2018). Introduction The Eurolect Observatory Project. In L. Mori (Ed.), Observing Eurolects. Corpus analysis of linguistic variation in EU law (Studies in Corpus Linguistics 86). Amsterdam: John Benjamins. (this volume).
Reppen, R. (2010). Building a corpus: What are the key considerations? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 31–37). London: Routledge.
Robbins, A. (2015). Effective AWK programming: Universal text processing and pattern matching. Sebastopol, CA: O’Reilly Media.
Schmitt, L. M., Christianson, K., & Gupta, R. (2007). Linguistic computing with UNIX Tools. In A. Kao & S. R. Poteet (Eds.), Natural language processing and text mining (pp. 221–258). London: Springer.
Weisser, M. (2016). Practical corpus linguistics: An introduction to corpus-based language analysis. Hoboken, NJ: Wiley & Sons.
This list is based on CrossRef data as of 27 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.