Article published In:
The Mental Lexicon: Online-First ArticlesNLP and education
Using semantic similarity to evaluate filled gaps in a large-scale Cloze test in the classroom
This study examines the applicability of the Cloze test, a widely used tool for assessing text comprehension
proficiency, while highlighting its challenges in large-scale implementation. To address these limitations, an automated
correction approach was proposed, utilizing Natural Language Processing (NLP) techniques, particularly word embeddings (WE)
models, to assess semantic similarity between expected and provided answers. Using data from Cloze tests administered to students
in Brazil, WE models for Brazilian Portuguese (PT-BR) were employed to measure the semantic similarity of the responses. The
results were validated through an experimental setup involving twelve judges who classified the students’ answers. A comparative
analysis between the WE models’ scores and the judges’ evaluations revealed that GloVe was the most effective model, demonstrating
the highest correlation with the judges’ assessments. This study underscores the utility of WE models in evaluating semantic
similarity and their potential to enhance large-scale Cloze test assessments. Furthermore, it contributes to educational
assessment methodologies by offering a more efficient approach to evaluating reading proficiency.
Keywords: Cloze test, word embeddings, semantic similarity
Article outline
- Introduction
- 2.The Cloze procedure
- 3.Semantic similarity
- 4.Method
- 4.1Participants
- 4.2Cloze test procedure
- 4.3Assessment of similarity by humans
- 4.4Assessment of similarity by word embeddings models
- 4.5Validation
- 5.Results
- 6.Conclusion
- Acknowledgements
- Note
-
References
Published online: 10 January 2025
https://doi.org/10.1075/ml.24027.deg
https://doi.org/10.1075/ml.24027.deg
References (19)
Bickley, A. C., Ellington, B. J., & Bickley, R. T. (1970). The
cloze procedure: A conspectus. Journal of Reading
Behavior, 2(3), 232–249.
Brown, J. D. (2002). Do
cloze tests work? Or is it just an illusion?. Second Language
Studies, 21 (1), 79–125.
(1980). Relative
merits of four methods for scoring cloze tests. The Modern Language
Journal, 64(3), 311–317.
Cardoso, P. B., Menezes, K. V., Freitas, F. O., & Freitag, R. M. K. (2024). Eficiência
na leitura: medidas de precisão e velocidade entre alunos do Colégio de Aplicação da Universidade Federal de
Sergipe. Revista Científica
Sigma,
5
(5), 120–143.
Chandrasekaran, D., & Mago, V. (2021). Evolution
of semantic similarity — a survey. ACM Computing Surveys
(CSUR),
54
(2), 1–37.
Cunha, N. D. B., & Santos, A. A. A. D. (2010). Estudos
de validade entre instrumentos que avaliam habilidades linguísticas. Estudos de Psicologia
(Campinas),
27
1, 305–314.
Darnell, D. K. (1968). The
Development of an English Language Proficiency Test of Foreign Students, Using a Clozentropy
Procedure. Final Report.
Gorman, J., & Curran, J. R. (2006, July). Scaling
distributional similarity to large corpora. In Proceedings of the 21
International Conference on Computational Linguistics and 44 Annual Meeting of the Association for Computational
Linguistics (pp. 361–368).
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., & Aluisio, S. (2017). Portuguese
word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint
arXiv:1708.06025.
Lange, K., Kühn, S., & Filevich, E. (2015). “Just
another tool for online studies” (JATOS): An easy solution for setup and management of web servers supporting online
studies. PloS
one,
10
(6), e0130834.
Levy, O., & Goldberg, Y. (2014). Neural
word embedding as implicit matrix factorization. Advances in neural information processing
systems, 271.
Ling, W., Dyer, C., Black, A. W., & Trancoso, I. (2015). Two/too
simple adaptations of word2vec for syntax problems. In Proceedings of
the 2015 conference of the North American chapter of Association for Computational Linguistics: human language
technologies (pp. 1299–1304).
Lowry, D. T., & Marr, T. J. (1975). Clozentropy
as a measure of international communication comprehension. Public Opinion
Quarterly,
39
(3), 301–312.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient
estimation of word representations in vector space. arXiv preprint
arXiv:1301.3781.
Mohammad, S. M., & Hirst, G. (2012). Distributional
measures of semantic distance: A survey. arXiv preprint
arXiv:1203.1858.
Oller Jr, J. W., & Conrad, C. A. (1971). The
Cloze technique and ESL proficiency. Language
Learning,
21
(2), 183–194.
Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove:
Global vectors for word representation. In Proceedings of the 2014
conference on empirical methods in natural language processing
(EMNLP) (pp. 1532–1543).