NLP and education: Using semantic similarity to evaluate filled gaps in a large-scale Cloze test in the classroom

de Gois, Túlio Sousa; Freitas, Flávia Oliveira; Tejada, Julian; Freitag, Raquel Meister Ko.

doi:10.1075/ml.24027.deg

Article published In:

The Mental Lexicon: Online-First Articles

NLP and education

Using semantic similarity to evaluate filled gaps in a large-scale Cloze test in the classroom

Túlio Sousa de Gois | Federal University of Sergipe

Flávia Oliveira Freitas | Federal University of Sergipe

Julian Tejada | Federal University of Sergipe

Raquel Meister Ko. Freitag | Federal University of Sergipe

This study examines the applicability of the Cloze test, a widely used tool for assessing text comprehension proficiency, while highlighting its challenges in large-scale implementation. To address these limitations, an automated correction approach was proposed, utilizing Natural Language Processing (NLP) techniques, particularly word embeddings (WE) models, to assess semantic similarity between expected and provided answers. Using data from Cloze tests administered to students in Brazil, WE models for Brazilian Portuguese (PT-BR) were employed to measure the semantic similarity of the responses. The results were validated through an experimental setup involving twelve judges who classified the students’ answers. A comparative analysis between the WE models’ scores and the judges’ evaluations revealed that GloVe was the most effective model, demonstrating the highest correlation with the judges’ assessments. This study underscores the utility of WE models in evaluating semantic similarity and their potential to enhance large-scale Cloze test assessments. Furthermore, it contributes to educational assessment methodologies by offering a more efficient approach to evaluating reading proficiency.

Keywords: Cloze test, word embeddings, semantic similarity

Article outline

Introduction
2.The Cloze procedure
3.Semantic similarity
4.Method
- 4.1Participants
- 4.2Cloze test procedure
- 4.3Assessment of similarity by humans
- 4.4Assessment of similarity by word embeddings models
- 4.5Validation
5.Results
6.Conclusion
Acknowledgements
Note
References

Published online: 10 January 2025

https://doi.org/10.1075/ml.24027.deg

References (19)

References

Bickley, A. C., Ellington, B. J., & Bickley, R. T. (1970). The cloze procedure: A conspectus. Journal of Reading Behavior, 2(3), 232–249.

Brown, J. D. (2002). Do cloze tests work? Or is it just an illusion?. Second Language Studies, 21 (1), 79–125.

(1980). Relative merits of four methods for scoring cloze tests. The Modern Language Journal, 64(3), 311–317.

Cardoso, P. B., Menezes, K. V., Freitas, F. O., & Freitag, R. M. K. (2024). Eficiência na leitura: medidas de precisão e velocidade entre alunos do Colégio de Aplicação da Universidade Federal de Sergipe. Revista Científica Sigma, 5 (5), 120–143.

Chandrasekaran, D., & Mago, V. (2021). Evolution of semantic similarity — a survey. ACM Computing Surveys (CSUR), 54 (2), 1–37.

Cunha, N. D. B., & Santos, A. A. A. D. (2010). Estudos de validade entre instrumentos que avaliam habilidades linguísticas. Estudos de Psicologia (Campinas), 27 1, 305–314.

Darnell, D. K. (1968). The Development of an English Language Proficiency Test of Foreign Students, Using a Clozentropy Procedure. Final Report.

Gorman, J., & Curran, J. R. (2006, July). Scaling distributional similarity to large corpora. In Proceedings of the 21 International Conference on Computational Linguistics and 44 Annual Meeting of the Association for Computational Linguistics (pp. 361–368).

Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., & Aluisio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025.

Lange, K., Kühn, S., & Filevich, E. (2015). “Just another tool for online studies” (JATOS): An easy solution for setup and management of web servers supporting online studies. PloS one, 10 (6), e0130834.

Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. Advances in neural information processing systems, 271.

Ling, W., Dyer, C., Black, A. W., & Trancoso, I. (2015). Two/too simple adaptations of word2vec for syntax problems. In Proceedings of the 2015 conference of the North American chapter of Association for Computational Linguistics: human language technologies (pp. 1299–1304).

Lowry, D. T., & Marr, T. J. (1975). Clozentropy as a measure of international communication comprehension. Public Opinion Quarterly, 39 (3), 301–312.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Mohammad, S. M., & Hirst, G. (2012). Distributional measures of semantic distance: A survey. arXiv preprint arXiv:1203.1858.

Oller Jr, J. W., & Conrad, C. A. (1971). The Cloze technique and ESL proficiency. Language Learning, 21 (2), 183–194.

Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).

Taylor, W. L. (1953). “Cloze procedure”: A new tool for measuring readability. Journalism Quarterly, 30 (4), 415–433.

Wobbrock, J. O., Findlater, L., Gergle, D., & Higgins, J. J. (2011, May). The aligned rank transforms for nonparametric factorial analyses using ANOVAanova procedures. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 143–146).