Investigating the scopes of textual metrics for learner level discrimination and learner analytics
This chapter investigates the linguistic interpretation of complexity metrics in L2 proficiency assessment. By analysing 84 formulas of metrics linked to lexical diversity, readability and syntactic complexity, we identify a taxonomy of their underlying linguistic scopes. These metrics are classified according to text, sentence, clause, phrase and word scopes with attributes and methods. Homogeneity of scopes was evaluated by applying a mixed clustering PCA approach to metrics computed for 328 L2 texts. Discriminative power was evaluated with a random forest approach on the same dataset including the CEFR levels. Results show that metrics are diversely clustered but they also suggest in-cluster homogeneity. The CEFR classification shows mixed results suggesting that diversity, repetition and size in word and text scopes are significant.
Article outline
- 1.Introduction
- 2.Theoretical background
- 2.1Complexity and linguistic scopes
- 2.2Automatic language assessment and complexity metrics as features
- 3.How scopes relate to metrics: A taxonomy
- Example 1
- Example 2
- Example 3
- 4.Data
- 4.1Corpus
- 4.2Human CEFR ratings
- 4.3Pre-processing and the dataset
- 4.4Experimental setup
- 5.Results
- 5.1Task 1: Homogeneity of the metrics and scopes
- 5.2Task 2: Proficiency level classification
- Classification with six classes
- Classification with 3 classes
- 6.Discussion and future perspectives
-
Notes
-
References
-
Appendix