Article published in:Spanish Learner Corpus Research: Current trends and future perspectives
Edited by Margarita Alonso-Ramos
[Studies in Corpus Linguistics 78] 2016
► pp. 33–52
What is missing in learner corpus design?
This chapter discusses what is missing in learner corpus design. Learner corpus researchers are sometimes not fully aware of the basic principles of corpus design and collection that most corpus linguists should know. I will first discuss theoretical and methodological issues related to learner corpus design and collection, focusing on sampling, representativeness, and corpus size. Then, I will review three relevant studies (Biber 1993; Tomasello & Stahl 2004; Mukherjee & Rohrbach 2006) in order to better understand corpus design issues such as parameters of corpus sampling, effects of sample size, and variations in learner corpus design. Finally, the chapter concludes by discussing critical assessment and future directions in terms of issues of design as well as data collection in learner corpus research.
Keywords: corpus design criteria, sample size, representativeness, data collection
Published online: 16 December 2016
Atkins, B.T.S., Clear, J. & Ostler, N.
Bybee, J. & Hopper, P.
Johansson, S., Leech, G. & Goodluck, H.
McEnery, T., Xiao, R. & Tono, Y.
Mukherjee, J. & Rohrbach, J.-M.
2005 Corpus and text – Basic principles. In Developing Linguistic Corpora: a Guide to Good Practice, M. Wynn (ed.), 1–16. Oxford: Oxbow Books. http://www.ahds.ac.uk/creating/guides/linguistic-corpora/chapter1.htm (25 May 2013).
Tomasello, M. & Stahl, D.
2012 International Corpus of Crosslinguistic Interlanguage: Project overview and a case study on the acquisition of new verb co-occurrence patterns. In Developmental and Crosslinguistic Perspectives in Learner Corpus Research [Tokyo University of Foreign Studies 4], Y. Tono, Y. Kawaguchi & M. Minegishi (eds), 27–46. Amsterdam: John Benjamins.