Edited by Margarita Alonso-Ramos
[Studies in Corpus Linguistics 78] 2016
► pp. 89–116
Chapter 4PoS-tagging a Spanish oral learner corpus
Criteria, procedure, and a sample analysis
This chapter explains the methodology that was followed to Part of Speech tag the Spanish oral learner corpus CORELE (Corpus Oral de Español como Lengua Extranjera; Campillos Llanos 2014). The data consist of forty interviews with learners at lower intermediate level from more than nine mother tongue (L1) backgrounds, and four interviews with native speakers (control group). The annotation was performed with the GRAMPAL tagger (Moreno & Guirao 2006). The learner corpus amounted to 52,759 lexical units (LUs), and the native corpus, to 8,643 LUs. The interface is available online and allows the user to explore learners’ interlanguage by searching data according to word form, lemma, L1, and/or proficiency level. I present a sample study on learners’ production of articles following the Contrastive Interlanguage Analysis approach (Granger 1996).
Article outline
- 1.Introduction
- 2.A brief overview of previous work
- 2.1Part of Speech tagging learner corpora
- 2.2Studies on articles in learner Spanish
- 3.Methodology
- 3.1Corpus data
- 3.2Part-of-Speech (PoS) tagging
- 3.3Count of lexical units
- 3.4The corpus interface
- 4.A sample analysis of learners’ production of Spanish articles
- 4.1Motivation
- 4.2Results
- 5.Discussion
- 6.Conclusions
-
Acknowledgments -
Notes -
References
https://doi.org/10.1075/scl.78.04cam