Edited by Agnieszka Leńko-Szymańska and Sandra Götz
[Studies in Corpus Linguistics 104] 2022
► pp. 21–50
This chapter investigates the linguistic interpretation of complexity metrics in L2 proficiency assessment. By analysing 84 formulas of metrics linked to lexical diversity, readability and syntactic complexity, we identify a taxonomy of their underlying linguistic scopes. These metrics are classified according to text, sentence, clause, phrase and word scopes with attributes and methods. Homogeneity of scopes was evaluated by applying a mixed clustering PCA approach to metrics computed for 328 L2 texts. Discriminative power was evaluated with a random forest approach on the same dataset including the CEFR levels. Results show that metrics are diversely clustered but they also suggest in-cluster homogeneity. The CEFR classification shows mixed results suggesting that diversity, repetition and size in word and text scopes are significant.