A flexible multi-layer corpus architecture: Competing target hypotheses in the Falko corpus

Reznicek, Marc; Lüdeling, Anke; Hirschmann, Hagen

doi:10.1075/scl.59.07rez

Part of

Automatic Treatment and Analysis of Learner Corpus Data
Edited by Ana Díaz-Negrillo, Nicolas Ballier and Paul Thompson
[Studies in Corpus Linguistics 59] 2013
► pp. 101–124

Competing target hypotheses in the Falko corpus

A flexible multi-layer corpus architecture

Marc Reznicek

Anke Lüdeling

Hagen Hirschmann

Error annotation is a key feature of modern learner corpora. Error identification is always based on some kind of reconstructed learner utterance (target hypothesis). Since a single target hypothesis can only cover a certain amount of linguistic information while ignoring other aspects, the need for multiple target hypotheses becomes apparent. Using the German learner corpus Falko as an example, we therefore argue for a flexible multi-layer stand-off corpus architecture where competing target hypotheses can be coded in parallel. Surface differences between the learner text and the target hypotheses can then be exploited for automatic error annotation.

Published online: 18 December 2013

https://doi.org/10.1075/scl.59.07rez

Cited by (20)

Cited by 20 other publications

Order by:

Gafni, Chen, Livnat Herzig Sheinfux, Hadar Klunover, Anat Bar Siman Tov, Anat Prior & Shuly Wintner

2024. Analyzing learner language: the case of the Hebrew Learner Essay Corpus. Language Resources and Evaluation

Spina, Stefania, Irene Fioravanti, Luciana Forti & Fabio Zanda

2024. The CELI corpus: Design and linguistic annotation of a new online learner corpus. Second Language Research 40:2 ► pp. 457 ff.

Sanguinetti, Manuela, Cristina Bosco, Lauren Cassidy, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djamé Seddah & Amir Zeldes

2023. Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations. Language Resources and Evaluation 57:2 ► pp. 493 ff.

Hirschmann, Hagen & Thomas Schmidt

2022. Gesprochene Lernerkorpora: Methodisch-technische Aspekte der Erhebung, Erschließung und Nutzung. Zeitschrift für germanistische Linguistik 50:1 ► pp. 36 ff.

Hoffmann, Tim

2022. Measuring lexical accuracy. In Complexity, Accuracy and Fluency in Learner Corpus Research [Studies in Corpus Linguistics, 104], ► pp. 159 ff.

Shadrova, Anna, Pia Linscheid, Julia Lukassek, Anke Lüdeling & Sarah Schneider

2021. A Challenge for Contrastive L1/L2 Corpus Studies: Large Inter- and Intra-Individual Variation Across Morphological, but Not Global Syntactic Categories in Task-Based Corpus Data of a Homogeneous L1 German Group. Frontiers in Psychology 12

Wisniewski, Katrin

2021. „Ist es B2-Niveau genug?“. Zeitschrift für Angewandte Linguistik 2021:75 ► pp. 364 ff.

Gilquin, Gaëtanelle

2020. Learner Corpora. In A Practical Handbook of Corpus Linguistics, ► pp. 283 ff.

Werner, Valentin

2020. Anne Golden, Scott Jarvis & Kari Tenfjord (Eds.), Crosslinguistic Influence and Distinctive Patterns of Language Learning: Findings and Insights from a Learner Corpus. . International Journal of Learner Corpus Research 6:1 ► pp. 104 ff.

Horbach, Andrea & Torsten Zesch

2019. The Influence of Variance in Learner Answers on Automatic Content Scoring. Frontiers in Education 4

Zeldes, Amir

2018. Chapter 11. Compounds and productivity in advanced L2 German writing. In Usage-inspired L2 Instruction [Language Learning & Language Teaching, 49], ► pp. 237 ff.

Zeldes, Amir

2020. Corpus Architecture. In A Practical Handbook of Corpus Linguistics, ► pp. 49 ff.

Lüdeling, Anke, Hagen Hirschmann & Anna Shadrova

2017. Linguistic Models, Acquisition Theories, and Learner Corpora: Morphological Productivity in SLA Research Exemplified by Complex Verbs in German. Language Learning 67:S1 ► pp. 96 ff.

MacWhinney, Brian

2017. A Shared Platform for Studying Second Language Acquisition. Language Learning 67:S1 ► pp. 254 ff.

Meurers, Detmar & Markus Dickinson

2017. Evidence and Interpretation in Language Learning Research: Opportunities for Collaboration With Computational Linguistics. Language Learning 67:S1 ► pp. 66 ff.

Odebrecht, Carolin, Malte Belz, Amir Zeldes, Anke Lüdeling & Thomas Krause

2017. RIDGES Herbology: designing a diachronic multi-layer corpus. Language Resources and Evaluation 51:3 ► pp. 695 ff.

Campillos Llanos, Leonardo

2016. PoS-tagging a Spanish oral learner corpus. In Spanish Learner Corpus Research [Studies in Corpus Linguistics, 78], ► pp. 89 ff.

Vyatkina, Nina

2016. TheKansas Developmental Learner corpus(KANDEL). International Journal of Learner Corpus Research 2:1 ► pp. 101 ff.

Lee, John, Chak Yan Yeung, Amir Zeldes, Marc Reznicek, Anke Lüdeling & Jonathan Webster

2015. CityU corpus of essay drafts of English language learners: a corpus of textual revision in second language writing. Language Resources and Evaluation 49:3 ► pp. 659 ff.

Mahlow, Cerstin

2015. Learning from Errors: Systematic Analysis of Complex Writing Errors for Improving Writing Technology. In Language Production, Cognition, and the Lexicon [Text, Speech and Language Technology, 48], ► pp. 419 ff.

This list is based on CrossRef data as of 20 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.