The Kansas Developmental Learner corpus (KANDEL): A developmental corpus of learner German

Vyatkina, Nina

doi:10.1075/ijlcr.2.1.04vya

Article published In:

International Journal of Learner Corpus Research
Vol. 2:1 (2016) ► pp.101–119

The Kansas Developmental Learner corpus (KANDEL)

A developmental corpus of learner German

Nina Vyatkina | The University of Kansas

This article presents the Kansas Developmental Learner corpus (KANDEL), a corpus of L2 German writing samples produced by several cohorts of North American university students over four semesters of instructed language study. This corpus expands the number of freely and publicly available learner corpora while adding to the depth of these corpora with a unique set of features. It does so by focusing on an L2 other than English, German, targeting beginning to intermediate L2 proficiency levels, and including dense developmental data and annotations for multiple linguistic variables, learner errors, and over twenty learner and task variables. Furthermore, this article reports the procedure and results of an inter-annotator agreement study as well as an in-depth analysis of annotator disagreement. In this way, it contributes to best practices of annotating learner corpora by making the annotation process transparent and demonstrating its reliability.

Keywords: beginning and intermediate L2 proficiency, inter-annotator agreement, error annotation, longitudinal learner corpus, L2 German, written corpus

Published online: 19 July 2016

https://doi.org/10.1075/ijlcr.2.1.04vya

References

Aarts, J. & Granger, S

1998 “Tag sequences in learner corpora: A key to interlanguage grammar and discourse”. In S. Granger (Ed.), Learner English on Computer. New York: Longman, 132–141.

Alexopoulou, T., Geertzen, J., Korhonen, A. & Meurers, D

2015 “Exploring big educational learner corpora for SLA research: Perspectives on relative clauses”, International Journal of Learner Corpus Research 1(1), 96–129.

Brants, T

2000 “Inter-Annotator agreement for a German newspaper corpus”. Proceedings of the Second International Conference on Language Resources and Evaluation . Athens, Greece: ELRA. Available at: [URL] (accessed 4 March 2016).

Byrnes, H., Maxim, H. & Norris, J.M

2010 “Realizing advanced foreign language writing development in collegiate education: Curricular design, pedagogy, assessment [Monograph]”. Modern Language Journal 941(S1).

Callies, M. & Paquot, M

2015 “An interview with Yukio Tono”, International Journal of Learner Corpus Research 1(1), 160–171.

Council of Europe

2001 Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Strasbourg: Language Policy Unit. Available at: [URL] (accessed 4 March 2016).

Granger, S

2015 “Contrastive interlanguage analysis: A reappraisal”, International Journal of Learner Corpus Research 1(1), 7–24.

Granger, S., Gilquin, G. & Meunier, F

2015 “Introduction: Learner corpus research – past, present and future”. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 1–5.

Granger, S. & Thewissen, J

2007 Computer-aided Error Analysis . Lecture presented at the Summer School Learner Corpus Research: From corpus design to data interpretation . University of Louvain/Belgium, 9–14 September 2007.

Gries, S.T

2015 “Statistics for learner corpus research”. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 159–181.

Gries, S.T. & Deshors, S

2015 “EFL and/vs. ESL?: A multi-level regression modeling perspective on bridging the paradigm gap”, International Journal of Learner Corpus Research 1(1), 130–159.

Gut, U

2012 “The LeaP corpus: A multilingual corpus of spoken learner German and learner English”. In T. Schmidt & K. Wörner (Eds.), Multilingual Corpora and Multilingual Corpus Analysis. Amsterdam: John Benjamins, 3–23.

Jarvis, S. & Pavlenko, A

2008 Crosslinguistic Influence in Language and Cognition. New York: Routledge.

Krummes, C. & Ensslin, A

2014 “What’s hard in German? WHiG: A British learner corpus of German”, Corpora 9(2), 191–205.

Larsen-Freeman, D

2006 “The emergence of complexity, fluency, and accuracy in the oral and written production of five Chinese learners of English”, Applied Linguistics 271, 590–619.

Lu, X

2010 “Automatic analysis of syntactic complexity in second language writing”, International Journal of Corpus Linguistics 15(4), 474–496.

Lüdeling, A

2008 “Mehrdeutigkeiten und Kategorisierung: Probleme bei der Annotation von Lernerkorpora”. In M. Walter & P. Grommes (Eds.), Fortgeschrittene Lernervarietäten: Korpuslinguistik und Zweitspracherwerbsforschung. Tübingen: Max Niemeyer Verlag, 119–140.

Lüdeling, A. & Hirschmann, H

2015 “Error annotation systems”. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 135–157.

Lüdeling, A., Walter, M., Kroymann, E. & Adolphs, P

2005 “Multi-level error annotation in learner corpora”, Proceedings of Corpus Linguistics 2005 , Birmingham, UK. Available at: [URL] (accessed 4 March 2016).

Mackey, A. & Gass, S

2005 Second Language Research: Methodology and Design. New York, NY: Routledge.

Maden-Weinberger, U

2015 “ ‘Hätte, wäre, wenn…’: A pseudo-longitudinal study of subjunctives in the Corpus of Learner German (CLEG)”, International Journal of Learner Corpus Research 1(1), 25–57.

Meunier, F. & Littré, D

2013 “Tracking learners’ progress: Adopting a dual corpus cum experimental data approach”, Modern Language Journal 97(S1), 61–76.

Meurers, D

2011 On automatically analyzing learner language. Keynote lecture presented at Learner Corpus Research 2011, Université Catholique de Louvain, Louvain-la-Neuve, Belgium, 15-17 September 2011. Available at: [URL] (accessed 4 March 2016).

Ortega, L. & Byrnes, H

2008 “Theorizing advancedness, setting up the longitudinal research agenda”. In L. Ortega & H. Byrnes (Eds.), The Longitudinal Study of Advanced L2 Capacities. New York, NY: Routledge/Taylor & Francis, 281–300.

Ortega, L. & Sinicrope, C

2008 Novice Proficiency in a Foreign Language: A Study of Task-based Performance Profiling on the STAMP Test. (Technical report). University of Oregon, Center for Applied Second Language Studies.

Ott, N., Ziai, R. & Meurers, D

2012 “Creation and analysis of a reading comprehension exercise corpus: Towards evaluating meaning in context”. In T. Schmidt & K. Wörner (Eds.), Multilingual Corpora and Multilingual Corpus Analysis. Amsterdam: John Benjamins, 47–69.

Reznicek, M., Lüdeling, A. & Hirschmann, H

2013 “Competing target hypotheses in the Falko corpus: A flexible multi-layer corpus architecture”. In A. Díaz-Negrillo, N. Ballier, & P. Thompson (Eds.), Automatic Treatment and Analysis of Learner Corpus Data. Amsterdam: John Benjamins, 101–124.

Reznicek, M., Lüdeling, A., Krummes, C., Schwantuschke, F., Walter, M., Schmidt, K., Hirschmann, H. & Andreas, T

2012 Das Falko-Handbuch: Korpusaufbau und Annotationen, Version 2.01. Available at: [URL] (accessed 4 March 2016).

Reznicek, M., Walter, M., Schmidt, K., Lüdeling, A., Hirschmann, H., Krummes, C. & Andreas, T

2010 Das Falko-Handbuch: Korpusaufbau und Annotationen, Version 1.0.1. Available at: [URL] (accessed 4 March 2016).

Schiller, A., Teufel, S., Stöckert, C. & Thielen, C

1999 Guidelines für das Tagging deutscher Textcorpora mit STTS [Guidelines for tagging German corpora of written language with STTS]. Technical Report. Stuttgart, Germany: Institut für maschinelle Sprachverarbeitung [Institute for Machine Language Processing].

Schmid, H

1994 “Probabilistic part-of-speech tagging using decision trees”, Proceedings of the International Conference on New Methods in Language Processing . Manchester, UK, 44–49. Available at: [URL] (accessed 4 March 2016).

Schmidt, T

2011 “A TEI-based approach to standardising spoken language transcription”, Journal of the Text Encoding Initiative 11. Available at: [URL] (accessed 4 March 2016).

Vyatkina, N

2012 “The development of second language writing complexity in groups and individuals: A longitudinal learner corpus study”, Modern Language Journal 96(4), 576–598.

2013a “Analyzing part-of-speech variability in a longitudinal learner corpus and a pedagogic corpus”. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Twenty Years of Learner Corpus Research: Looking Back, Moving Ahead. Corpora and Language in Use - Proceedings 1. Louvain-la-Neuve: Presses universitaires de Louvain, 479–491.

2013b “Specific syntactic complexity: Developmental profiling of individuals based on an annotated learner corpus”, Modern Language Journal 97(s1), 11–30.

2016 “Data-driven learning for beginners: The case of German verb-preposition collocations”, ReCALL 28(2), 207–226.

Vyatkina, N., Hirschmann, H. & Golcher, F

2015 “Syntactic modification at early stages of L2 German writing development: A longitudinal learner corpus study”, Journal of Second Language Writing 291, 28–50.

Wisniewski, K., Schöne, K., Nicolas, L., Vettori, C., Boyd, A., Meurers, D., Abel, A. & Hana, J

2013 “MERLIN: An online trilingual learner corpus empirically grounding the European Reference Levels in authentic learner data”. In ICT for Language Learning, Conference Proceedings 2013 . Libreriauniversitaria.it Edizioni. Available at: [URL] (accessed 4 March 2016).

Zinsmeister, H. & Breckle, M

2012 “The ALeSKo learner corpus: Design – annotation – quantitative analyses”. In T. Schmidt & K. Wörner (Eds.), Multilingual Corpora and Multilingual Corpus Analysis. Amsterdam: John Benjamins, 71–96.

Cited by

Cited by 3 other publications

Larsson, Tove, Magali Paquot & Luke Plonsky

2020. Inter-rater reliability in Learner Corpus Research. International Journal of Learner Corpus Research 6:2 ► pp. 237 ff.

Spina, Stefania, Irene Fioravanti, Luciana Forti & Fabio Zanda

2024. The CELI corpus: Design and linguistic annotation of a new online learner corpus. Second Language Research 40:2 ► pp. 457 ff.

Wisniewski, Katrin

2020. SLA developmental stages in the CEFR-related learner corpus MERLIN. International Journal of Learner Corpus Research 6:1 ► pp. 1 ff.

This list is based on CrossRef data as of 1 june 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.