Design – annotation – quantitative analyses: The ALeSKo learner corpus

Zinsmeister, Heike; Breckle, Margit

doi:10.1075/hsm.14.06zin

Part of

Multilingual Corpora and Multilingual Corpus Analysis
Edited by Thomas Schmidt and Kai Wörner
[Hamburg Studies on Multilingualism 14] 2012
► pp. 71–96

The ALeSKo learner corpus

Design – annotation – quantitative analyses

Heike Zinsmeister

Margit Breckle

The ALesKo learner corpus is a small-scale comparable corpus consisting of two subcorpora: annotated essays by advanced Chinese learners of German and comparable essays by German native speakers. The motivation for its compilation was the investigation of discourse-related phenomena such as local coherence in second-language acquisition of German. After introducing how the texts were compiled and annotated, the article focuses on quantitative studies at the token level. We discuss problems of tokenisation and part-of-speech tagging and compare the inventory of the two subcorpora in terms of frequently used N-grams and lexical richness, among other aspects. We conclude the article by describing possible applications of the study in foreign language acquisition research and language teaching.

Published online: 15 November 2012

https://doi.org/10.1075/hsm.14.06zin

Cited by (1)

Cited by 1 other publications

Vyatkina, Nina

2016. TheKansas Developmental Learner corpus(KANDEL). International Journal of Learner Corpus Research 2:1 ► pp. 101 ff.

This list is based on CrossRef data as of 13 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.