Vol. 8:2 (2022) ► pp.261–282
A multilingual learner corpus for less commonly taught languages
This article provides a detailed account of the framework, pedagogical and research applications of the Multilingual Academic Corpus of Assignments – Writing and Speech (MACAWS). [1] 1 MACAWS is a monitor learner corpus of written and oral assignments produced by foreign language learners in the context of their language learning classrooms. Currently the corpus focuses on two less commonly taught languages rarely represented in learner corpora, Portuguese and Russian, and contains 124,054 words in Russian and 536,168 in Portuguese, being updated each semester as new texts are added to the corpus. The online interface is designed for ease of use by teachers and students. Our novel interactive data-driven learning (iDDL) tool allows embedding of concordance lines into websites and learning management systems (LMS), facilitating student interaction with concordance lines. Researchers can gain access to an offline corpus for greater flexibility.
Article outline
- 1.Introduction: Background and motivation
- 2.Data collection
- 2.1Context of foreign language programs
- 2.2Metadata: Course, assignment and learners
- 3.Corpus building
- 3.1Processing and transcription
- 3.2De-identification of texts
- 3.3Corpus organization: Assignment, topic and macrogenre
- 4.Current corpus
- 4.1Corpus statistics
- 4.2Corpus interface
- 4.3Interactive data-driven learning (iDDL)
- 5.Research and pedagogical applications
- 6.Limitations
- 7.Conclusion
- 8.Future directions
- Notes
-
References
https://doi.org/10.1075/ijlcr.21001.som