Shared task report
Machine learning for learner English
A plea for creating learner data challenges
This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR)
levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a
classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of
this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between
20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of
the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research
(LCR) community. The main findings address the methods used and lexical bias introduced by the task.
Article outline
- 1.Introduction
- 2.Learner corpora in shared tasks
- 3.Aims of the competition
- 4.Data set description
- 4.1Features provided
- 4.2Training labels
- 4.3Competition landmarks
- 5.Evaluation
- 6.Results and discussion
- 6.1The leader board
- 6.2Analysis of the different proposed solutions
- 6.2.1Features used
- 6.2.2Representation
- 6.2.3Classification methods
- 6.2.4The software used
- 6.2.5Methodological concerns
- 6.3Analysis of the results
- 6.4Lessons learnt
- 7.Conclusions
- Acknowledgements
- Notes
-
References
References
Abney, S.
2007 Semisupervised learning for computational linguistics. London: Chapman and Hall/CRC.
Alexopoulou, T., Michel, M., Murakami, A., & Meurers, D.
2017 Task effects on linguistic complexity and accuracy: A large-scale learner corpus analysis employing natural language processing techniques.
Language Learning, 67(S1), 180–208.
Alexopoulou, T., Yannakoudakis, H., & Salamoura, A.
2013 Classifying intermediate learner English: a data-driven approach to learner corpora. In
Twenty years of learner corpus research: Looking back, moving ahead (pp. 11–23). Belgium: Presses Universitaires de Louvain.
Attali, Y. & Burstein, J.
2006 Automated essay scoring with e-rater® v.2.
The Journal of Technology, Learning and Assessment, 4(3).
Balikas, G.
2018 Lexical bias in essay level prediction.
ArXiv e-prints.
Barker, F., Salamoura, A., & Saville, N.
2015 Learner corpora and language testing. In
S. Granger,
G. Gilquin, &
F. Meunier (Eds.),
The Cambridge Handbook of Learner Corpus Research (pp. 511–534). Cambridge: Cambridge University Press.
Baur, C., Caines, A., Chua, C., Gerlach, J., Qian, M., Rayner, M., Russell, M., Strik, H., & Wei, X.
2018 Overview of the 2018 spoken CALL shared task. In
Interspeech 2018, 2354–2358. Geneva: ISCA.
Baur, C., Chua, C., Gerlach, J., Rayner, E., Russel, M., Strik, H., & Wei, X.
2017 Overview of the 2017 spoken CALL shared task. In
Workshop on Speech and Language Technology in Education (SLaTE). Stockholm, Sweden.
Boyd, A., Hana, J., Nicolas, L., Meurers, D., Wisniewski, K., Abel, A., Schöne, K., Stindlová, B., & Vettori, C.
2014 The MERLIN corpus: Learner language and the CEFR. In
LREC, 1281–1288. Reykjavik, Iceland.
Chen, X. & Meurers, D.
2016 CTAP: A web-based tool supporting automatic complexity analysis. In
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), 113–119.
Council of Europe
(
2001a Common European Framework of Reference for Lan- guages: Learning, teaching, assessment. Strasbourg, Language Policy Division: Cambridge University Press.
Council of Europe
(
2001b Common European Framework of Reference for Lan- guages: Learning, teaching, assessment. Structured overview of all CEFR scales. Strasbourg, Language Policy Division: Cambridge University Press.
Council of Europe
(
2018 Common European Framework of Reference for Languages: Learning, teaching, assessment; Companion volume with new descriptors. Strasbourg, Language Policy Division: Cambridge University Press.
Crossley, S. A., Salsbury, T., McNamara, D. S., & Jarvis, S.
2011 Predicting lexical proficiency in language learner texts using computational indices.
Language Testing, 28(4), 561–580.
Cushing Weigle, S.
2010 Validation of automated scores of TOEFL iBT tasks against non-test indicators of writing ability.
Language Testing, 27(3), 335–353.
Dahlmeier, D., Ng, H. T., & Wu, S. M.
2013 Building a large annotated corpus of learner English: The NUS corpus of learner English. In
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, 22–31. Association for Computational Linguistics. Atlanta, Georgia.
Dale, R. & Kilgarriff, A.
2011 Helping our own: The HOO 2011 pilot shared task. In
Proceedings of the 13th European Workshop on Natural Language Generation, ENLG ’11, 242–249. Association for Computational Linguistics. Nancy, France.
Dale, R., Anisimoff, I., & Narroway, G.
2012 HOO 2012: A report on the preposition and determiner error correction shared task. In
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, NAACL HLT ’12, 54–62. Association for Computational Linguistics. Montreal, Canada.
Díaz-Negrillo, A., Ballier, N., & Thompson, P.
Flach, P.
2012 Machine learning: The art and science of algorithms that make sense of data. Cambridge: Cambridge University Press.
Friedman, J., Hastie, T., & Tibshirani, R.
2001 The elements of statistical learning, volume 1. New York: Springer Series in Statistics.
Geertzen, J., Alexopoulou, T., & Korhonen, A.
2013 Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge open language database (EFCAMDAT). In
Proceedings of the 31st Second Language Research Forum. Somerville, MA: Cascadilla Proceedings Project.
Goldberg, Y.
2017 Neural network methods for natural language processing. synthesis lectures on human language technologies. San Rafael, CA: Morgan & Claypool Publishers.
Granger, S., Kraif, O., Ponton, C., Antoniadis, G., & Zampa, V.
2007 Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness.
ReCALL, 19(3), 252–268.
Hawkins, J. A. & Buttery, P.
2010 Criterial features in learner corpora: Theory and illustrations.
English Profile Journal, 1(01).
Hawkins, J. A. & Filipović, L.
2012 Criterial features in L2 English: Specifying the reference levels of the Common European Framework, volume 1 of English Profile Studies. United Kingdom: Cambridge University Press.
Higgins, D., Ramineni, C., & Zechner, K.
2015 Learner corpora and automated scoring. In
S. Granger,
G. Gilquin, &
F. Meunier (Eds.),
The Cambridge Handbook of Learner Corpus Research (pp. 587–604). Cambridge: Cambridge University Press.
Hopman, E., Thompson, B., Austerweil, J., & Lupyan, G.
2018 Predictors of L2 word learning accuracy: A big data investigation. In
the 40th Annual Conference of the Cognitive Science Society (CogSci 2018), 513–518.
Jarvis, S. & Paquot, M.
2015 Learner corpora and native language identification. In
S. Granger,
G. Gilquin, &
F. Meunier (Eds.),
The Cambridge Handbook of Learner Corpus Research (pp. 605–628). Cambridge: Cambridge University Press.
Jarvis, S.
2011 Data mining with learner corpora. In
F. Meunier,
S. De Cock,
G. Gilquin, &
M. Paquot (Eds.),
A taste for corpora: In honour of Sylviane Granger (pp. 127–154). Amsterdam and Philadelphia: John Benjamins.
Le, Q. V. & Mikolov, T.
2014 Distributed representations of sentences and documents. ArXiv: 1405.4053.
Leacock, C., Chodorow, M., Gamon, M., & Tetreault, J.
2010 Automated grammatical error detection for language learners.
Synthesis Lectures on Human Language Technologies, 3(1), 1–134.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P.
2017 Focal loss for dense object detection. In
Proceedings of the IEEE International Conference on Computer Vision, 2980–2988.
Lissón, P. & Ballier, N.
2018 Investigating learners’ progression in French as a foreign language: vocabulary growth and lexical diversity.
CUNY Student Research Day. Poster.
Lissón, P.
2017 Investigating the use of readability metrics to detect differences in written productions of learners: a corpus-based study.
Bellaterra Journal of Teaching & Learning Language & Literature, 10(4), 68–86.
Liu, B.
2012 Sentiment analysis and opinion mining. San Rafael, CA: Morgan & Claypool Publishers.
Lu, X.
2014 Computational methods for corpus annotation and analysis. New York: Springer.
Magerman, D. M.
1995 Statistical decision-tree models for parsing. In
Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, 276–283. Association for Computational Linguistics.
Malmasi, S., Evanini, K., Cahill, A., Tetreault, J., Pugh, R., Hamill, C., Napolitano, D., & Qian, Y.
2017 A report on the 2017 native language identification shared task. In
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 62–75. Association for Computational Linguistics. Copenhagen, Denmark.
Meurers, D.
2015 Learner corpora and natural language processing. In
S. Granger,
G. Gilquin, &
F. Meunier (Eds.),
The Cambridge Handbook of Learner Corpus Research (pp. 537–566). Cambridge: Cambridge University Press.
Michalke, M.
2017 koRpus: An R package for text analysis. (
Version 0.10–2). Available at:
[URL] (accessed October 2018).
Mons, B.
2018 Data stewardship for open science: Implementing FAIR principles. London: Chapman and Hall/CRC.
Murakami, A.
2014 Individual variation and the role of L1 in the L2 development of English grammatical morphemes: Insights from learner corpora. PhD thesis, University of Cambridge.
Murakami, A.
2016 Modeling systematicity and individuality in nonlinear second language development: The case of English grammatical morphemes.
Language Learning, 66(4), 834–871.
Murphy, K. P.
2012 Machine learning. A probabilistic perspective. Adaptive Com- putation and Machine Learning. Cambridge (MA): MIT Press.
Ng, H. T., Wu, S. M., Briscoe, T., Hadiwinoto, C., Susanto, R. H., & Bryant, C.
2014 The CoNLL-2014 shared task on grammatical error correction. In
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, 1–14. Association for Computational Linguistics. Baltimore, Maryland.
Nissim, M., Abzianidze, L., Evang, K., van der Goot, R., Haagsma, H., Plank, B., & Wieling, M.
2017 Sharing is caring: The future of shared tasks.
Computational Linguistics, 43(4), 897–904.
Page, E. B.
1968 The use of the computer in analyzing student essays.
International Review of Education / Internationale Zeitschrift für Erziehungswissenschaft / Revue Internationale de l’Education, 14(2), 210–225.
Paroubek, P., Chaudiron, S., & Hirschman, L.
2007 Principles of evaluation in natural language processing.
Traitement Automatique des Langues, 48(1), 7–31.
Rich, A., Popp, P. O., Halpern, D., Rothe, A., & Gureckis, T.
2018 Modeling second-language learning from a psychological perspective. In
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, 223–230.
Sang, E. F. & De Meulder, F.
2003 Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050, 142–147.
Settles, B.
2018 Data for the 2018 Duolingo shared task on second language acquisition modeling (SLAM). Available at:
. (accessed October 2018).
Settles, B., Brust, C., Gustafson, E., Hagiwara, M., & Madnani, N.
2018 Second language acquisition modeling. In
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, 56–65.
Shermis, M. D., Burstein, J., Higgins, D., & Zechner, K.
2010 Automated essay scoring: Writing assessment and instruction”. In
P. Peterson,
E. Baker, &
B. McGaw (Eds.),
International Encyclopedia of Education (Third Edition) (pp. 20–26). Oxford: Elsevier.
Tetreault, J., Burstein, J., Kochmar, E., Leacock, C., & Yannakoudakis, H.
2018 Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics. New Orleans, Louisiana.
Thewissen, J.
2015 Accuracy across proficiency levels: A learner corpus approach. Louvain: Presses universitaires de Louvain.
Thrun, S. & Pratt, L.
1998 Learning to learn. Norwell, MA, USA: Kluwer Aca- demic Publishers.
Vajjala, S. & Loo, K.
2014 Automatic CEFR level prediction for Estonian learner text. In
NEALT Proceedings Series, volume 221, 113–128.
Volodina, E., Pilán, I. & Alfter, D.
2016 Classification of Swedish learner essays by CEFR levels.
CALL Communities and Culture–Short Papers from EURO- CALL 2016, 456–461.
Wisniewski, K.
2017 Empirical learner language and the levels of the Common European Framework of Reference.
Language Learning, 67(S1), 232–253.
Yannakoudakis, H., Briscoe, T., & Medlock, B.
2011 A New dataset and method for automatically grading ESOL texts. In
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies – Volume 1, HLT ’11, 180–189. Association for Computational Linguistics.
Yannakoudakis, H., Kochmar, E., Leacock, C., Madnani, N., Pilán, I., & Zesch, T.
2019 Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics. Florence, Italy.
Cited by
Cited by 3 other publications
Gaillat, Thomas, Andrew Simpkin, Nicolas Ballier, Bernardo Stearns, Annanda Sousa, Manon Bouyé & Manel Zarrouk
2021.
Predicting CEFR levels in learners of English: The use of microsystem criterial features in a machine learning approach.
ReCALL ► pp. 1 ff.
Jimenez, Sergio, Fabio N Silva, George Dueñas & Alexander Gelbukh
2022.
ProficiencyRank: Automatically ranking expertise in online collaborative social networks.
Information Sciences 588
► pp. 231 ff.
Paquot, Magali & Marcus Callies
This list is based on CrossRef data as of 15 april 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.