Corpus report
A multilingual learner corpus for less commonly taught languages
This article provides a detailed account of the framework, pedagogical and research applications of the Multilingual Academic Corpus of Assignments – Writing and Speech (MACAWS). MACAWS is a monitor learner corpus of written and oral assignments produced by foreign language learners in the context of their language learning classrooms. Currently the corpus focuses on two less commonly taught languages rarely represented in learner corpora, Portuguese and Russian, and contains 124,054 words in Russian and 536,168 in Portuguese, being updated each semester as new texts are added to the corpus. The online interface is designed for ease of use by teachers and students. Our novel interactive data-driven learning (iDDL) tool allows embedding of concordance lines into websites and learning management systems (LMS), facilitating student interaction with concordance lines. Researchers can gain access to an offline corpus for greater flexibility.
Article outline
- 1.Introduction: Background and motivation
- 2.Data collection
- 2.1Context of foreign language programs
- 2.2Metadata: Course, assignment and learners
- 3.Corpus building
- 3.1Processing and transcription
- 3.2De-identification of texts
- 3.3Corpus organization: Assignment, topic and macrogenre
- 4.Current corpus
- 4.1Corpus statistics
- 4.2Corpus interface
- 4.3Interactive data-driven learning (iDDL)
- 5.Research and pedagogical applications
- 6.Limitations
- 7.Conclusion
- 8.Future directions
- Notes
-
References
References (42)
References
Ädel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and non-native speakers of English: A lexical bundles approach. English for Specific Purposes,
31
(2), 81–92.
Chronicle of Higher Education Staff (2019, January 29). Which colleges grant the most degrees in foreign languages? The Chronicle of Higher Education. [URL]
Bell, P., & Payant, C. (2020). Designing learner corpora, collection, transcription, and annotation. In N. Tracy-Ventura, & M. Paquot (Eds.), The Routledge Handbook of Second Language Acquisition and Corpora (pp. 53–67). Routledge.
Bertho, M., Novikov, A., Picoral, A., Sommer-Farias, B., & Staples, S. (2020). Taking Flight with MACAWS: Learner corpora from and into the classroom (Webinar for Center for Educational Resources in Culture Language and Literacy) [Video]. Youtube. [URL]
Biber, D., & Conrad, S. (2019). Register, genre, and style. Cambridge University Press.
Chen, Y. H., & Baker, P. (2016). Investigating criterial discourse features across second language development: Lexical bundles in rated learner essays, CEFR B1, B2 and C1. Applied Linguistics,
37
(6), 849–880.
Davies, M. (2010). The Corpus of Contemporary American English as the first reliable monitor corpus of English. Literary and Linguistic Computing,
25
(4), 447–464.
Dutra, D. P., Orfano, B., & Sardinha, T. B. (2014). Stance bundles in learner corpora. In S. Aluisio, & S. Tagnin (Eds.), New language technologies and linguistic research: A two-way road (pp. 2–15). Cambridge Scholars Publishing.
Egbert, J. (2019). Corpus design and representativeness. In J. Egbert, T. Berber Sardinha, & M. Veirano Pinto (Eds.), Multi-dimensional analysis: Research methods and current issues (pp. 27–42). Bloomsbury Academic.
Forsyth, H. (2014). The influence of L2 transfer on L3 English written production in a bilingual German/Italian population: A study of syntactic errors. Open Journal of Modern Linguistics,
4
(3), 429–456.
Gao, J., Picoral, A., Staples, S., & MacDonald, L. (2021). Citation practices of L2 writers in first-year writing courses: Form, rhetorical function, and connection with pedagogical materials. Applied Corpus Linguistics,
1
(2), 100005.
Gardner, S., & Nesi, H. (2013). A classification of genre families in university student writing. Applied Linguistics,
34
(1), 25–52.
Ghanem, R., Edalatishams, I., Huensch, A., Puga, K., & Staples, S. (2020). The effectiveness of digital tools in the analysis of spoken discourse: Towards a protocol for pronunciation corpora. In O. Kang, S. Staples, K. Yaw, & K. Hirschi (Eds.), Proceedings of the 11th Pronunciation in Second Language Learning and Teaching Conference (Northern Arizona University, September 2019) (pp. 97–114). Iowa State University.
Granger, S. (2002). A bird’s-eye view of learner corpus research. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 3–33). John Benjamins Publishing.
Granger, S., Gilquin, G., & Meunier, F. (2015). Introduction: Learner corpus research–past, present and future. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 1–5). Cambridge University Press.
Hyland, K. (2007). Genre pedagogy: Language, literacy and L2 writing instruction. Journal of Second Language Writing,
16
(3), 148–164.
Jouët-Pastré, C., Klobucka, A., Sobral, P., Moreira, M., & Hutchinson, A. (2014). Ponto de encontro: Portuguese as a world language. Pearson Education Limited.
Kagan, O., Kudyma, A., & Miller, F. (2016). V puti: Russian grammar in context. Pearson Prentice Hall.
Kudyma, A., Miller, F., & Kagan, O. (2017). Beginner’s Russian: With interactive online workbook: A basic Russian course. Hippocrene Books.
Kwon, M. H., Partridge, R. S., & Staples, S. (2018). Building a local learner corpus: Construction of a first-year ESL writing corpus for research, teaching, mentoring, and collaboration. International Journal of Learner Corpus Research,
4
(1), 112–127.
Long, M. H., Gor, K., & Jackson, S. (2012). Linguistic correlates of second language proficiency: Proof of concept with ILR 2–3 in Russian. Studies in Second Language Acquisition,
34
(1), 99–126.
Lorimer Leonard, R., & Shapiro, S. (Eds.). (2023). Critical Language Awareness: A Lens for Looking Backward, Outward, and Forward in L2 Writing [Special issue]. Journal of Second Language Writing.
Martins, C., Ferreira, T., Sitoe, M., Abrantes, C., Janssen, M., Fernandes, A., Silva, A., Lopes, I., Pereira, I., & Santos, J. (2019). Corpus de produções escritas de aprendentes de PL2 (PEAPL2): Subcorpus Português língua estrangeira [Corpus of written productions of PL2 learners (PEAPL2): Portuguese subcorpus as a foreign language]. CELGA-ILTEC.
Mendes, A., Antunes, S., Janssen, M., & Gonçalves, A. (2016). The COPLE2 corpus: a learner corpus for Portuguese. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 3207–3214). European Language Resources Association (ELRA).
Milleret, M. (2016). Introduction: Portuguese instruction in the U.S. In M. Milleret, & M. Risner (Eds.), A Handbook for Portuguese Instructors in the U.S (pp. 11–17). Boavista Press.
Novikov, A. (2021). Syntactic and morphological complexity measures as markers of L2 development in Russian (Unpublished doctoral dissertation). The University of Arizona.
Picoral, A. (2020). L3 Portuguese by Spanish-English bilinguals: Copula construction use and acquisition in corpus data (Unpublished doctoral dissertation). The University of Arizona.
Rakhilina, E., Vyrenkova, A., Mustakimova, E., Ladygina, A., & Smirnov, I. (2016). Building a learner corpus for Russian. In E. Volodina, G. Grigonytė, I. Pilán, K. Nilsson Björkenstam, & L. Borin (Eds.), Proceedings of the Joint Workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition (pp. 66–75). LiU Electronic Press.
Regents of the University of Michigan. (2009). Michigan Corpus of Upper-Level Student Papers (MICUSP). [URL]
Robin, R., Evans-Romaine, K., & Shatalina, G. (2012). Golosa: A basic course in Russian, Book One. Pearson Higher Education.
Robin, R., Evans-Romaine, K., & Shatalina, G. (2013). Golosa: A basic course in Russian, Book Two. Pearson Higher Education.
Sommer-Farias, B., Carvalho, A., & Picoral, A. (2020). Portuguese language program evaluation: Implementation, results and follow-up strategies. Journal of the National Council of Less Commonly Taught Languages,
28
1, 1–50.
Sommer-Farias, B., Novikov, A., Picoral, A., Bertho, M. C., & Staples, S. (2021). Soaring Higher with MACAWS (Webinar for Center for Educational Resources in Culture Language and Literacy) [Video]. Youtube. [URL]
Sommer-Farias, B., & Picoral, A. (2020, March). Lexical bundles across genres in an L3 learner corpus [Conference presentation, canceled]. American Association of Applied Linguistics Conference, Denver, United States.
Staples, S., & Dilger, B. (2018–). Corpus and Repository of Writing (Crow). [URL]
Staples, S., Novikov, A., Picoral, A., & Sommer-Farias, B. (2019–). Multilingual Academic Corpus of Assignments – Writing and Speech. [URL]
Staples, S., & Tardy, C. (2019, November). Genre classification of student writing: Methods and insights [Paper presentation]. Symposium on Second Language Writing, Arizona State University, Phoenix, United States.
Steele, J., & Colantoni, L. (2004). The University of Toronto Romance Phonetics Database. University of Toronto: Faculty of Arts and Science. [URL]
University Analytics & Institutional Research. (2021). Enrollment – Census Highlights: Fall 2021 [Interactive Fact Book]. The University of Arizona. [URL]
Cited by (2)
Cited by two other publications
Paquot, Magali
2024.
Learner corpus research: a critical appraisal and roadmap for contributing (more) to SLA research agendas.
Corpus Linguistics and Linguistic Theory 20:3
► pp. 567 ff.
Sommer-Farias, Bruna, Valentina Vinokurova, Asya Gorlova & Mariana Centanin-Bertho
2023.
Teaching with Learner Corpus Data.
The FLTMAG
This list is based on CrossRef data as of 19 november 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.