Investigating the scopes of textual metrics for learner level discrimination and learner analytics

Gaillat, Thomas

doi:10.1075/scl.104.02gai

Part of

Complexity, Accuracy and Fluency in Learner Corpus Research
Edited by Agnieszka Leńko-Szymańska and Sandra Götz
[Studies in Corpus Linguistics 104] 2022
► pp. 21–50

Investigating the scopes of textual metrics for learner level discrimination and learner analytics

Thomas Gaillat | University of Rennes 2

This chapter investigates the linguistic interpretation of complexity metrics in L2 proficiency assessment. By analysing 84 formulas of metrics linked to lexical diversity, readability and syntactic complexity, we identify a taxonomy of their underlying linguistic scopes. These metrics are classified according to text, sentence, clause, phrase and word scopes with attributes and methods. Homogeneity of scopes was evaluated by applying a mixed clustering PCA approach to metrics computed for 328 L2 texts. Discriminative power was evaluated with a random forest approach on the same dataset including the CEFR levels. Results show that metrics are diversely clustered but they also suggest in-cluster homogeneity. The CEFR classification shows mixed results suggesting that diversity, repetition and size in word and text scopes are significant.

Keywords: complexity metrics, linguistic interpretation, scope, L2 proficiency, clustering

Article outline

1.Introduction
2.Theoretical background
- 2.1Complexity and linguistic scopes
- 2.2Automatic language assessment and complexity metrics as features
3.How scopes relate to metrics: A taxonomy
- Example 1
- Example 2
- Example 3
4.Data
- 4.1Corpus
- 4.2Human CEFR ratings
- 4.3Pre-processing and the dataset
- 4.4Experimental setup
  - Task 1
  - Task 2
5.Results
- 5.1Task 1: Homogeneity of the metrics and scopes
- 5.2Task 2: Proficiency level classification
  - Classification with six classes
  - Classification with 3 classes
6.Discussion and future perspectives
Notes
References
Appendix

Published online: 1 December 2022

https://doi.org/10.1075/scl.104.02gai

References (67)

References

Arnold, Taylor, Ballier, Nicolas, Gaillat, Thomas & Lissòn, Paula. 2018. Predicting CEFR levels in learner English on the basis of metrics and full texts. ArXiv: 1806.11099: 75–82. <[URL]> (15 December 2021).

Baayen, Harald R. 2008. Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge: CUP.

Ballier, Nicolas, Canu, Stéphane, Petitjean, Caroline, Gasso, Gilles, Balhana, Carlos, Alexopoulou, Theodora & Gaillat, Thomas. 2020. Machine learning for learner English. International Journal of Learner Corpus Research 6(1): 72–103.

Ballier, Nicolas & Gaillat, Thomas. 2016. Classifying French learners of English with written-based lexical and complexity metrics. In Actes de la conférence conjointe JEP-TALN-RECITAL 2016 volume 09: ELTAL, Ivan Šmilauer & Jovan Kostov (eds). 1–14. Paris: Association Francophone pour la Communication Parlée (AFCP) and Association pour le Traitement Automatique des Langues (ATALA). <[URL]> (16 December 2021).

Ballier, Nicolas, Gaillat, Thomas, Simpkin, Andrew, Stearns, Bernardo, Bouyé, Manon & Zarrouk, Manel. 2019. A supervised learning model for the automatic assessment of language levels based on learner errors. In Transforming Learning with Meaningful Technologies [Lecture Notes in Computer Science], Maren Scheffel, Julien Broisin, Viktoria Pammer-Schindler, Andri Ioannou & Jan Schneider (eds), 308–320. Cham: Springer.

Benoit, Kenneth, Watanabe, Kohei, Wang, Haiyan, Nulty, Paul, Obeng, Adam, Müller, Stefan & Matsuo, Akitaka. 2018. Quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software 3(30): 774.

Biber, Douglas, Gray, Bethany, Staples, Shelley & Egbert, Jesse. 2020. Investigating grammatical complexity in L2 English writing research: Linguistic description versus predictive measurement. Journal of English for Academic Purposes 46: 100869.

Bulté, Bram & Housen, Alex. 2012. Defining and operationalising L2 complexity. In Dimensions of L2 Performance and Proficiency: Complexity, Accuracy and Fluency in SLA [Language Learning & Language Teaching 32], Alex Housen, Folkert Kuiken & Ineke Vedder (eds), 21–46. Amsterdam: John Benjamins.

Bulté, Bram & Roothooft, Hanne. 2020. Investigating the interrelationship between rated L2 proficiency and linguistic complexity in L2 speech. System 91: 102246.

Callies, Marcus. 2015. Learner corpus methodology. In The Cambridge Handbook of Learner Corpus Research, Sylviane Granger, Gaëtanelle Gilquin & Fanny Meunier (eds), 35–56. Cambridge: CUP.

Chall, Jeanne S. & Dale, Edgar. 1995. Readability Revisited: The New Dale-Chall Readability Formula. Cambridge MA: Brookline Books.

Chavent, Marie, Kuentz, Simonet V., Liquet, Benoit & Saracco, Jérôme. 2012. ClustOfVar: An R package for the clustering of variables. Journal of Statistical Software 50(13): 1–16.

Chen, Miao & Zechner, Klaus. 2011. Computing and evaluating syntactic complexity features for automated scoring of spontaneous non-native speech. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Dekang Lin, Yuji Matsumoto & Rada Mihalcea (eds), 722–731. Stroudsburg PA: Association for Computational Linguistics. <[URL]> (15 December 2021).

Council of Europe. 2001. Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: CUP.

. 2018. Common European Framework of Reference for Languages: Learning, Teaching, Assessment: Companion Volume with New Descriptors. Strasbourg: Council of Europe.

Crossley, Scott A., Kyle, Kristopher, Allen, Laura K., Guo, Liang & McNamara, Danielle S. 2014. Linguistic microfeatures to predict L2 writing proficiency: A case study in automated writing evaluation. The Journal of Writing Assessment 7(1). <[URL]> (15 December 2021).

Crossley, Scott A., Kyle, Kristopher & McNamara, Danielle S. 2016. The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior Research Methods 48(4): 1227–1237.

Crossley, Scott A., Salsbury, Tom, McNamara, Danielle S. & Jarvis, Scott. 2011. Predicting lexical proficiency in language learner texts using computational indices. Language Testing 28(4): 561–580.

Dale, Robert, Anisimoff, Ilya & Narroway, George. 2012. HOO 2012: A report on the preposition and determiner error correction shared task. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, Joel Tetreault, Jill Burstein & Claudia Leacock (eds), 54–62. Stroudsburg PA: Association for Computational Linguistics. <[URL]> (15 December 2021).

Davies, Mark. 2009. The 385+ million word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights. International Journal of Corpus Linguistics 14(2): 159–190.

Eguchi, Masaki & Kyle, Kristopher. 2020. Continuing to explore the multidimensional nature of lexical sophistication: The case of oral proficiency interviews. The Modern Language Journal 104(2): 381–400.

Fellbaum, Christiane (ed.). 1998. WordNet: An Electronic Lexical Database [Language, Speech, and Communication]. Cambridge MA: The MIT Press.

François, Thomas & Watrin, Patrick. 2011. On the contribution of MWE-based features to a readability formula for French as a foreign language. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011, Ruslan Mitkov & Galia Angelova (eds), 441–447. Hissar: Association for Computational Linguistics. <[URL]> (16 December 2021).

Gaillat, Thomas, Janvier, Pascale, Dumont, Bénédicte, Lafontaine, Antoine & Kerfati, Anas. 2019. CELVA.Sp: A corpus for the visualisation of linguistic profiles in language learners. PERL 2019 Université de Paris Diderot, Dec 2019, Paris, France. <[URL]> (15 December 2021).

Gaillat, Thomas, Simpkin, Andrew, Ballier, Nicolas, Stearns, Bernardo, Sousa, Annanda, Bouyé, Manon, & Zarrouk, Manel. 2021. Predicting CEFR levels in learners of English: The use of microsystem criterial features in a machine learning approach. ReCALL.

Gilquin, Gaëtanelle. 2015. From design to collection of learner corpora. In The Cambridge Handbook of Learner Corpus Research [Cambridge Handbooks in Language and Linguistics], Sylviane Granger, Gaëtanelle Gilquin & Fanny Meunier (eds), 9–34. Cambridge: CUP.

Hawkins, John A. & Filipović, Luna. 2012. Criterial Features in L2 English: Specifying the Reference Levels of the Common European Framework [English Profile Studies 1]. Cambridge: CUP.

Khushik, Ghulam A. & Huhta, Ari. 2020. Investigating syntactic complexity in EFL learners’ writing across Common European Framework of Reference Levels A1, A2, and B1. Applied Linguistics 41(4): 506–532.

Kim, Minkyung & Crossley, Scott A. 2018. Modeling second language writing quality: A structural equation investigation of lexical, syntactic, and cohesive features in source-based and independent writing. Assessing Writing 37: 39–56.

Koizumi, Rie & In’nami, Yo. 2012. Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System 40(4): 522–532.

Kyle, Kristopher. 2016. Measuring Syntactic Development in L2 Writing: Fine-grained Indices of Syntactic Complexity and Usage-Based Indices of Syntactic Sophistication. PhD dissertation, Georgia State University.

Kyle, Kristopher & Crossley, Scott A. 2015. Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4): 757–86.

Kyle, Kristopher, Crossley, Scott & Berger, Cynthia. 2018. The tool for the automatic analysis of lexical sophistication (TAALES), Version 2.0. Behavior Research Methods 50(3): 1030–1046.

Kyle, Kristopher, Crossley, Scott A. & Jarvis, Scott. 2021. Assessing the validity of lexical diversity indices using direct judgements. Language Assessment Quarterly 18(2): 154–170.

Lahmann, Cornelia, Steinkrauss, Rasmus & Schmid, Monika S. 2019. Measuring linguistic complexity in long-term L2 speakers of English and L1 attriters of German. International Journal of Applied Linguistics 29(2): 173–191.

Leacock, Claudia, Chodorow, Martin & Tetreault, Joel. 2015. Automatic grammar- and spell-checking for language learners. In The Cambridge Handbook of Learner Corpus Research [Cambridge Handbooks in Language and Linguistics], Sylviane Granger, Gaëtanelle Gilquin & Fanny Meunier (eds), 567–586. Cambridge: CUP.

Leńko-Szymańska, Agnieszka. 2019. Defining and Assessing Lexical Proficiency. New York NY: Routledge.

Levy, Roger & Andrew, Galen. 2006. Tregex and Tsurgeon: Tools for querying and manipulating tree data structures. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias (eds), 2231–2234. Genoa: European Language Resources Association (ELRA). <[URL]> (15 December 2021).

Liaw, Andy & Wiener, Matthew. 2002. Classification and regression by randomForest. R News 2(3): 18–22.

Lissón, Paula. 2017. Investigating the use of readability metrics to detect differences in written productions of learners: A corpus-based study. Bellaterra Journal of Teaching and Learning Language and Literature 10(4): 68–86.

Lu, Xiaofei. 2010. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics 15(4): 474–496.

. 2012. The relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern Language Journal 96(2): 190–208.

. 2014. Computational Methods for Corpus Annotation and Analysis. Dordrecht: Springer.

Manning, Christopher D., Surdeanu, Mihai, Bauer, John, Finkel, Jenny, Bethard, Steven J. & McClosky, David. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Kalina Bontcheva & Jingbo Zhu (eds), 55–60. Baltimore MD: Association for Computational Linguistics. <[URL]> (15 December 2021).

McCarthy, Philip M. & Jarvis, Scott. 2010. MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods 42(2): 381–392.

Ng, Hwee Tou, Wu, Siew Mei, Briscoe, Ted, Hadiwinoto, Christian, Susanto, Raymond Hendy & Bryant, Christopher. 2014. The CoNLL-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto & Christopher Bryant (eds), 1–14. Baltimore MD: Association for Computational Linguistics. <[URL]> (15 December 2021).

Norris, John M. & Ortega, Lourdes. 2009. Towards an organic approach to investigating CAF in instructed SLA: The Case of complexity. Applied Linguistics 30(4): 555–578.

Norris, John & Ortega, Lourdes. 2008. Defining and measuring SLA. In The Handbook of Second Language Acquisition, Catherine Doughty & Michael H. Long (eds), 716–761. Oxford: John Wiley & Sons.

O’Keeffe, Anne & Mark, Geraldine. 2017. The English Grammar Profile of learner competence: Methodology and key findings. International Journal of Corpus Linguistics 22(4): 457–489.

Pilán, Ildikó & Volodina, Elena. 2018. Investigating the importance of linguistic complexity features across different datasets related to language learning. In Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing, Leonor Becerra-Bonache, M. Dolores Jiménez-López, Carlos Martín-Vide, Adrià Torrens-Urrutia (eds), 49–58. Santa Fe NM: Association for Computational Linguistics. <[URL]> (15 December 2021).

R Core Team. 2012. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.

Read, John. 2000. Assessing Vocabulary. Cambridge: CUP.

Rudzewitz, Björn, Ziai, Ramon, Nuxoll, Florian, Kuthy, Kordula De & Meurers, Walt Detmar. 2019. Enhancing a web-based language tutoring system with learning analytics. In Joint Proceedings of the Workshops of the 12th International Conference on Educational Data Mining co-located with the 12th International Conference on Educational Data Mining, EDM 2019 Workshops, Luc Paquette, Cristóbal Romero (eds). Montréal: CEUR-WS. <[URL]> (15 December 2021).

Shute, Valerie J. 2008. Focus on formative feedback. Review of Educational Research 78(1): 153–189.

Smith, Edgar A., & Senter, Roderick J. 1967. Automated Readability Index. AMRL-TR-66-22. Wright-Paterson Air Force Base OH: Aerospace Medical Division.

Swartz, Merryanna L. & Yazdani, Masoud. 2012. Intelligent Tutoring Systems for Foreign Language Learning: The Bridge to International Communication. Berlin: Springer.

Tanaka-Ishii, Kumiko & Aihara, Shunsuke. 2015. Computational constancy measures of texts – Yule’s K and Rényi’s entropy. Computational Linguistics 41(3): 481–502.

Tetreault, Joel, Burstein, Jill, Kochmar, Ekaterina, Leacock, Claudia & Yannakoudakis, Helen (eds). 2018. Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications. New Orleans LA: Association for Computational Linguistics.

Treffers-Daller, Jeanine, Parslow, Patrick & Williams, Shirley. 2016. Back to basics: How measures of lexical diversity can help discriminate between CEFR Levels. Applied Linguistics 39(3): 302–327.

Vajjala, Sowmya & Loo, Kaidi. 2014. Automatic CEFR level prediction for Estonian learner text. In Proceedings of the Third Workshop on NLP for Computer Assisted Language Learning, Elena Volodina, Lars Borin, Ildikó Pilán (eds), 113–127. Uppsala: LiU Electronic Press. <[URL]> (15 December 2021).

Venant, Rémi & D’Aquin, Mathieu. 2019. Towards the prediction of semantic complexity based on concept graphs. In 12th International Conference on Educational Data Mining (EDM 2019), Collin F. Lynch, Agathe Merceron, Michel Desmarais & Roger Nkambou (eds), 188–197. Montreal: International Educational Data Mining Society (IEDMS). <[URL]> (15 December 2021).

Venant, Rémi, Sharma, Kshitij, Dillenbourg, Pierre, Vidal, Philippe & Broisin, Julien. 2017. A study of learners’ behaviors in hands-on learning situations and their correlation with academic performance. In Artificial Intelligence in Education [Lecture Notes in Computer Science 10331], Elisabeth André, Ryan Baker, Xiangen Hu, Ma, Mercedes T. Rodrigo & Benedict du Boulay (eds), 570–573. Cham: Springer.

Volodina, Elena, Pilán, Ildikó & Alfter, David. 2016. Classification of Swedish learner essays by CEFR levels. In CALL Communities and Culture – Short Papers from EUROCALL 2016, Salomi Papadima-Sophocleous, Linda Bradley & Sylvie Thouësny (eds), 456–461. Dublin: Research-publishing.net.

Wolfe-Quintero, Kate, Inagaki, Shunji & Kim, Hae-Young. 1998. Second Language Development in Writing: Measures of Fluency, Accuracy, & Complexity. Honolulu HI: Second Language Teaching & Curriculum Center, University of Hawai’i at Manoa.

Yannakoudakis, Helen, Andersen, Øistein E., Geranpayeh, Ardeshir, Briscoe, Ted & Nicholls, Diane. 2018. Developing an automated writing placement system for ESL learners. Applied Measurement in Education 31(3): 251–267.

Yannakoudakis, Helen, Briscoe, Ted & Medlock, Ben. 2011. A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Dekang Lin, Yuji Matsumoto, Rada Mihalcea (eds), 180–189. Stroudsburg PA: Association for Computational Linguistics. <[URL]> (15 December 2021).

Yule, G. Udny. 1944. The Statistical Study of Literary Vocabulary. Cambridge: CUP.