The Kansas Developmental Learner corpus (KANDEL)
A developmental corpus of learner German
This article presents the Kansas Developmental Learner corpus (KANDEL), a corpus of L2 German writing samples produced by several cohorts of North American university students over four semesters of instructed language study. This corpus expands the number of freely and publicly available learner corpora while adding to the depth of these corpora with a unique set of features. It does so by focusing on an L2 other than English, German, targeting beginning to intermediate L2 proficiency levels, and including dense developmental data and annotations for multiple linguistic variables, learner errors, and over twenty learner and task variables. Furthermore, this article reports the procedure and results of an inter-annotator agreement study as well as an in-depth analysis of annotator disagreement. In this way, it contributes to best practices of annotating learner corpora by making the annotation process transparent and demonstrating its reliability.
References
Aarts, J. & Granger, S
1998 “
Tag sequences in learner corpora: A key to interlanguage grammar and discourse”. In
S. Granger (Ed.),
Learner English on Computer. New York: Longman, 132–141.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Alexopoulou, T., Geertzen, J., Korhonen, A. & Meurers, D
Brants, T
2000 “
Inter-Annotator agreement for a German newspaper corpus”. Proceedings of the
Second International Conference on Language Resources and Evaluation
. Athens, Greece: ELRA. Available at:
[URL] (accessed 4 March 2016).
Byrnes, H., Maxim, H. & Norris, J.M
2010 “
Realizing advanced foreign language writing development in collegiate education: Curricular design, pedagogy, assessment [Monograph]”.
Modern Language Journal 941(S1).
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Council of Europe
2001 Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Strasbourg: Language Policy Unit. Available at:
[URL] (accessed 4 March 2016).
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Granger, S., Gilquin, G. & Meunier, F
2015 “
Introduction: Learner corpus research – past, present and future”. In
S. Granger,
G. Gilquin, &
F. Meunier (Eds.),
The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 1–5.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Granger, S. & Thewissen, J
2007
Computer-aided Error Analysis
. Lecture presented at the
Summer School Learner Corpus Research: From corpus design to data interpretation
. University of Louvain/Belgium, 9–14 September 2007.
Gries, S.T
2015 “
Statistics for learner corpus research”. In
S. Granger,
G. Gilquin, &
F. Meunier (Eds.),
The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 159–181.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Jarvis, S. & Pavlenko, A
2008 Crosslinguistic Influence in Language and Cognition. New York: Routledge.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Krummes, C. & Ensslin, A
2014 “
What’s hard in German? WHiG: A British learner corpus of German”,
Corpora 9(2), 191–205.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Larsen-Freeman, D
2006 “
The emergence of complexity, fluency, and accuracy in the oral and written production of five Chinese learners of English”,
Applied Linguistics 271, 590–619.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lüdeling, A
2008 “
Mehrdeutigkeiten und Kategorisierung: Probleme bei der Annotation von Lernerkorpora”. In
M. Walter &
P. Grommes (Eds.),
Fortgeschrittene Lernervarietäten: Korpuslinguistik und Zweitspracherwerbsforschung. Tübingen: Max Niemeyer Verlag, 119–140.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lüdeling, A. & Hirschmann, H
2015 “
Error annotation systems”. In
S. Granger,
G. Gilquin, &
F. Meunier (Eds.),
The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 135–157.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lüdeling, A., Walter, M., Kroymann, E. & Adolphs, P
2005 “
Multi-level error annotation in learner corpora”, Proceedings of
Corpus Linguistics 2005
, Birmingham, UK. Available at:
[URL] (accessed 4 March 2016).
Mackey, A. & Gass, S
2005 Second Language Research: Methodology and Design. New York, NY: Routledge.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Meunier, F. & Littré, D
2013 “
Tracking learners’ progress: Adopting a dual corpus cum experimental data approach”,
Modern Language Journal 97(S1), 61–76.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Meurers, D
2011 On automatically analyzing learner language. Keynote lecture presented at Learner Corpus Research 2011, Université Catholique de Louvain, Louvain-la-Neuve, Belgium, 15-17 September 2011. Available at:
[URL] (accessed 4 March 2016).
Ortega, L. & Byrnes, H
2008 “
Theorizing advancedness, setting up the longitudinal research agenda”. In
L. Ortega &
H. Byrnes (Eds.),
The Longitudinal Study of Advanced L2 Capacities. New York, NY: Routledge/Taylor & Francis, 281–300.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ortega, L. & Sinicrope, C
2008 Novice Proficiency in a Foreign Language: A Study of Task-based Performance Profiling on the STAMP Test. (
Technical report). University of Oregon, Center for Applied Second Language Studies.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ott, N., Ziai, R. & Meurers, D
Reznicek, M., Lüdeling, A. & Hirschmann, H
Reznicek, M., Lüdeling, A., Krummes, C., Schwantuschke, F., Walter, M., Schmidt, K., Hirschmann, H. & Andreas, T
2012 Das Falko-Handbuch: Korpusaufbau und Annotationen, Version 2.01. Available at:
[URL] (accessed 4 March 2016).
Reznicek, M., Walter, M., Schmidt, K., Lüdeling, A., Hirschmann, H., Krummes, C. & Andreas, T
2010 Das Falko-Handbuch: Korpusaufbau und Annotationen, Version 1.0.1. Available at:
[URL] (accessed 4 March 2016).
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Schiller, A., Teufel, S., Stöckert, C. & Thielen, C
1999 Guidelines für das Tagging deutscher Textcorpora mit STTS [
Guidelines for tagging German corpora of written language with STTS]. Technical Report. Stuttgart, Germany: Institut für maschinelle Sprachverarbeitung [Institute for Machine Language Processing].
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Schmid, H
1994 “
Probabilistic part-of-speech tagging using decision trees”, Proceedings of the
International Conference on New Methods in Language Processing
. Manchester, UK, 44–49. Available at:
[URL] (accessed 4 March 2016).
Schmidt, T
2011 “
A TEI-based approach to standardising spoken language transcription”,
Journal of the Text Encoding Initiative 11. Available at:
[URL] (accessed 4 March 2016).
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Vyatkina, N
2012 “
The development of second language writing complexity in groups and individuals: A longitudinal learner corpus study”,
Modern Language Journal 96(4), 576–598.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Vyatkina, N
2013a “
Analyzing part-of-speech variability in a longitudinal learner corpus and a pedagogic corpus”. In
S. Granger,
G. Gilquin, &
F. Meunier (Eds.),
Twenty Years of Learner Corpus Research: Looking Back, Moving Ahead. Corpora and Language in Use - Proceedings 1. Louvain-la-Neuve: Presses universitaires de Louvain, 479–491.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Vyatkina, N
2013b “
Specific syntactic complexity: Developmental profiling of individuals based on an annotated learner corpus”,
Modern Language Journal 97(s1), 11–30.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Vyatkina, N
2016 “
Data-driven learning for beginners: The case of German verb-preposition collocations”,
ReCALL 28(2), 207–226.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Vyatkina, N., Hirschmann, H. & Golcher, F
2015 “
Syntactic modification at early stages of L2 German writing development: A longitudinal learner corpus study”,
Journal of Second Language Writing 291, 28–50.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Wisniewski, K., Schöne, K., Nicolas, L., Vettori, C., Boyd, A., Meurers, D., Abel, A. & Hana, J
2013 “
MERLIN: An online trilingual learner corpus empirically grounding the European Reference Levels in authentic learner data”. In
ICT for Language Learning, Conference Proceedings 2013
. Libreriauniversitaria.it Edizioni. Available at:
[URL] (accessed 4 March 2016).
Zinsmeister, H. & Breckle, M
Cited by
Cited by 3 other publications
Larsson, Tove, Magali Paquot & Luke Plonsky
Spina, Stefania, Irene Fioravanti, Luciana Forti & Fabio Zanda
2024.
The CELI corpus: Design and linguistic annotation of a new online learner corpus.
Second Language Research 40:2
► pp. 457 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
This list is based on CrossRef data as of 1 june 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.