Shared task report
Machine learning for learner English
A plea for creating learner data challenges
Carlos Balhana | University of Cambridge
Theodora Alexopoulou | University of Cambridge
Thomas Gaillat | Université Universités de Rennes 1&2, LIDILE
This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR)
levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a
classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of
this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between
20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of
the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research
(LCR) community. The main findings address the methods used and lexical bias introduced by the task.
Keywords: natural language processing (NLP), machine learning, learners of English, EFCAMDAT corpus, CEFR, Language proficiency
Article outline
- 1.Introduction
- 2.Learner corpora in shared tasks
- 3.Aims of the competition
- 4.Data set description
- 4.1Features provided
- 4.2Training labels
- 4.3Competition landmarks
- 5.Evaluation
- 6.Results and discussion
- 6.1The leader board
- 6.2Analysis of the different proposed solutions
- 6.2.1Features used
- 6.2.2Representation
- 6.2.3Classification methods
- 6.2.4The software used
- 6.2.5Methodological concerns
- 6.3Analysis of the results
- 6.4Lessons learnt
- 7.Conclusions
- Acknowledgements
- Notes
-
References
Published online: 14 April 2020
https://doi.org/10.1075/ijlcr.18012.bal
https://doi.org/10.1075/ijlcr.18012.bal
References
Alexopoulou, T., Michel, M., Murakami, A., & Meurers, D.
Alexopoulou, T., Yannakoudakis, H., & Salamoura, A.
Attali, Y. & Burstein, J.
Barker, F., Salamoura, A., & Saville, N.
Baur, C., Caines, A., Chua, C., Gerlach, J., Qian, M., Rayner, M., Russell, M., Strik, H., & Wei, X.
Baur, C., Chua, C., Gerlach, J., Rayner, E., Russel, M., Strik, H., & Wei, X.
Boyd, A., Hana, J., Nicolas, L., Meurers, D., Wisniewski, K., Abel, A., Schöne, K., Stindlová, B., & Vettori, C.
Callies, M. & Paquot, M.
Chen, X. & Meurers, D.
Council of Europe
Crossley, S. A., Salsbury, T., McNamara, D. S., & Jarvis, S.
Cushing Weigle, S.
Dahlmeier, D., Ng, H. T., & Wu, S. M.
Dale, R. & Kilgarriff, A.
Dale, R., Anisimoff, I., & Narroway, G.
Díaz-Negrillo, A., Ballier, N., & Thompson, P.
Flach, P.
Friedman, J., Hastie, T., & Tibshirani, R.
Geertzen, J., Alexopoulou, T., & Korhonen, A.
Goldberg, Y.
Granger, S., Kraif, O., Ponton, C., Antoniadis, G., & Zampa, V.
Hawkins, J. A. & Buttery, P.
Hawkins, J. A. & Filipović, L.
Higgins, D., Ramineni, C., & Zechner, K.
Hopman, E., Thompson, B., Austerweil, J., & Lupyan, G.
Jarvis, S. & Paquot, M.
Jarvis, S.
Le, Q. V. & Mikolov, T.
Leacock, C., Chodorow, M., Gamon, M., & Tetreault, J.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P.
Lissón, P. & Ballier, N.
Lissón, P.
Magerman, D. M.
Malmasi, S., Evanini, K., Cahill, A., Tetreault, J., Pugh, R., Hamill, C., Napolitano, D., & Qian, Y.
Meurers, D.
Michalke, M.
2017 koRpus: An R package for text analysis. (Version 0.10–2). Available at: https://reaktanz.de/?c=hacking&s=koRpus (accessed October 2018).
Mons, B.
Murakami, A.
Murphy, K. P.
Ng, H. T., Wu, S. M., Briscoe, T., Hadiwinoto, C., Susanto, R. H., & Bryant, C.
Nissim, M., Abzianidze, L., Evang, K., van der Goot, R., Haagsma, H., Plank, B., & Wieling, M.
O’Keeffe, A. & Mark, G.
Page, E. B.
Paquot, M. & Plonsky, L.
Paroubek, P., Chaudiron, S., & Hirschman, L.
Rich, A., Popp, P. O., Halpern, D., Rothe, A., & Gureckis, T.
Sang, E. F. & De Meulder, F.
Settles, B.
Settles, B., Brust, C., Gustafson, E., Hagiwara, M., & Madnani, N.
Shermis, M. D., Burstein, J., Higgins, D., & Zechner, K.
Tetreault, J., Burstein, J., Kochmar, E., Leacock, C., & Yannakoudakis, H.
Thewissen, J.
Vajjala, S. & Loo, K.
Volodina, E., Pilán, I. & Alfter, D.
Wisniewski, K.
Yannakoudakis, H., Briscoe, T., & Medlock, B.
Cited by
Cited by 3 other publications
Gaillat, Thomas, Andrew Simpkin, Nicolas Ballier, Bernardo Stearns, Annanda Sousa, Manon Bouyé & Manel Zarrouk
Jimenez, Sergio, Fabio N Silva, George Dueñas & Alexander Gelbukh
Paquot, Magali & Marcus Callies
This list is based on CrossRef data as of 15 april 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.