Article In:
International Journal of Learner Corpus Research: Online-First ArticlesThe Core Metadata Schema for Learner Corpora (LC-meta)
Collaborative efforts to advance data discoverability, metadata quality and study comparability in L2 research
Metadata is critical throughout the research process, from study design to corpus selection/compilation, result
interpretability and cumulative research. To date, however, learner corpus research has not developed community standards or best
practices for metadata collection and sharing. In this article, we present the results of a collaborative project aimed at
addressing this issue by developing a standardised metadata schema for learner corpora. We first describe the procedure
implemented to design the schema, including the ways in which we continuously involved learner corpus researchers in this
initiative. We then introduce the Core Metadata Schema for Learner Corpora (LC-meta, Version 2), which consists
in a set of obligatory and optional variables that encapsulate crucial information about L2 data (administrative details, corpus
design, text-related variables, learner-related variables, annotations, annotators, or transcribers). Finally, we discuss future
developments and emphasise the importance of continued maintenance and further refinement of this schema by the research
community.
Keywords: Learner corpus, metadata, community standards, FAIR principles, data discoverability, corpus compilation, study comparability
Article outline
- 1.Introduction
- 2.Development of the Core Metadata Schema for Learner Corpora (LC-meta)
- 3.The Core Metadata Schema for Learner Corpora (LC-meta)
- 3.1General structure of the schema: Eight interrelated components
- 3.2Metadata elements: Design principles and key characteristics
- 4.Future developments
- 5.Conclusion
- Open material badge
- Acknowledgements
- Notes
- Author queries
-
References
This content is being prepared for publication; it may be subject to changes.
References (30)
Barker, F., Salamoura, A. & Saville, N. (2015). Learner
corpora and language testing. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The
Cambridge handbook of learner corpus
research (pp. 511–533). Cambridge University Press.
Burnard, L. (2004). Developing
linguistic corpora: a guide to good practice. Metadata for corpus work. [URL]
Carlsen, C. (2012). Proficiency
level — A fuzzy variable in computer learner corpora. Applied
Linguistics, 33(2), 161–183.
Council of Europe. (2020). Common European
Framework of Reference for Languages: Learning, teaching, assessment — Companion
volume. Council of Europe Publishing, Strasbourg, available
at [URL]
Frey, J.-C., König, A., Stemle, E. & Paquot, M. (2023). Core
Metadata Schema for L2 data [Conference presentation]. 32nd
Conference of the European Second Language Association (EUROSLA), 30 August — 2 September
2023, University of Birmingham, UK.
Gilquin, G. (2015). From
design to collection of learner corpora. In S. Granger, G. Gilquin & F. Meunier (Eds.), The
Cambridge handbook of learner corpus
research (pp. 10–34). Cambridge University Press.
Granger, S. & Lefer, M.-A. (2020). The
Multilingual Student Translation corpus: a resource for translation teaching and
research. Language Resources and
Evaluation, 541: 1183–1199. [URL].
Granger, S. & Paquot, M. (2017). Towards
standardization of metadata for L2 corpora. Invited talk at
the CLARIN workshop on Interoperability of Second Language Resources and
Tools, 6–8 December 2017, University of Gothenburg,
Sweden. [URL]
Higgins, S. (2007). What
are metadata standards? Digital Curation Centre. Standards Watch Papers. [URL]
Ide, N. (1998). Encoding
linguistic corpora. Sixth Workshop on Very Large
Corpora (pp. 9–17). [URL]
Kerz, E. & Wiechmann, D. (2020). Individual
differences. In N. Tracy-Ventura & M. Paquot (Eds.), The
Routledge handbook of second language acquisition and
corpora (pp. 394–406). Routledge.
König, A., Frey, J.-C. & Stemle, E. (2021). Exploring
reusability and reproducibility for a research infrastructure for L1 and L2 learner
corpora. Information 12(5): 199,
Kormos, J. (2020). Specific
learning difficulties in second language learning and teaching. Language
Teaching, 53(2), 129–143.
Larsson, T., Paquot, M., & Biber, D. (2021). On
the importance of register in learner writing: A multi-dimensional
approach. In E. Seoane & D. Biber (Eds.), Corpus
based approaches to register
variation (pp. 235–258). Benjamins.
Lehmberg, T. & Wörner, K. (2008). Annotation
standards. In A. Lüdeling & M. Kytö (Eds.), Corpus
linguistics — An international
handbook (volume 11) (pp. 484–501). Walter de Gruyter.
Li, S., Hiver, P., & Papi, M. (2022). Individual
differences in second language acquisition: Theory, research, and
practice. In S. Li, P. Hiver & M. Papi (Eds.), The
Routledge handbook of second language acquisition and individual
differences (pp. 3–33). Routledge.
Lindström Tiedemann, T., Lenardič, J., & Fišer, D. (2018). L2
learner corpus survey: Towards improved verifiability, reproducibility and inspiration in learner corpus
research. Proceedings of CLARIN Annual Conference 2018, Pisa, Italy, 8–10 October
2018, pp. 146–150. [URL]
MacWhinney, B. (2017). A
shared platform for studying second language acquisition. Language
Learning, 67(S1), 254–275.
Ortega, L. (2019). SLA
and the study of equitable multilingualism. The Modern Language
Journal, 1031, 23–38.
Paquot, M. (2023). The
Core Metadata Schema for L2 data: Collaborative efforts towards improved data findability, metadata quality and study
comparability in L2 research. “Corpus Linguistics and Applied Linguistics Research” series
of online talks, Universidad de Murcia, Spain, 30 October
2023. [URL]
Stemle, E. W., Boyd, A., Janssen, M., Tiedemann, T. L., Preradovic, N. M., Rosen, A., Rosén, D., & Volodina, E. (2019). Working
together towards an ideal infrastructure for language learner
corpora. In A. Abel, A. Glaznieks, V. Lyding & L. Nicolas (Eds.), Widening
the scope of learner corpus research. Selected papers from the fourth Learner Corpus Research
Conference (pp. 427–468). Corpora
and Language in Use — Proceedings 5, Presses Universitaires de Louvain.
Tracy-Ventura, N., Paquot, M. & Myles, F. (2021). The
future of corpora in SLA. In N. Tracy-Ventura & M. Paquot (Eds), The
Routledge handbook of second language acquisition and
corpora (pp. 409–424). Routledge.
Volodina, E., Janssen, M., Lindström Tiedemann, T., Mikelic Preradovic, N., Ragnhildstveit, S., Tenfjord, K., & de Smedt, K. (2018). Interoperability
of second language resources and tools. Proceedings of the CLARIN annual conference
2018, Pisa, Italy, 8–10
October 2018, 90–94. [URL]
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The
FAIR Guiding Principles for scientific data management and stewardship. Scientific
Data, 3(1), 160018.