Researching learner language through POS keyword and syntactic complexity analyses
In this paper, we explore the affordances of two different research methods that may be instrumental in analysing learner language complexity: standard corpus linguistics methodology and automatic syntactic complexity analysers. Our results suggest that POS keyword analysis and automatic syntactic analysis are both effective for the identification of linguistic features at different levels of development in instructed SLA. In particular, countable nouns, prepositional phrases, verbs and general adverbs are criterial features that define the transition from lower to higher secondary school language learning in the Spanish component of the ICCI corpus. We suggest that the analysis of complexity in noun phrases is of great interest for researchers and teachers in terms of identifying development milestones in language acquisition.
Article outline
- 1.Introduction
- 2.Research methodology
- 2.1Data
- 2.2Research methods
- 3.Contrasting learner corpora (1): POS keyword analysis
- 3.1Grades 7 and 8
- 3.2Grades 11 and 12
- 3.3Grades 7 and 8 vs Grades 11 and 12
- 4.Contrasting learner corpora (2): Automatic syntactic complexity analysis
- 4.1Grades 7, 8, 11 and 12: Complexity in the noun phrase
- 4.2Grades 7, 8, 11 and 12: Syntactic sophistication
- 4.2.1Traditional measures of syntactic complexity
- 4.2.2Measures of syntactic sophistication
- 4.3Grades 7 and 8 vs Grades 11 and 12: Complexity in the noun phrase and syntactic sophistication measures
- 5.Discussion and pedagogical implications
- 5.1RQ (1) Do different groups of learners present distinct linguistic features?: Can these features be identified by means of automatic analysis of language?
- 5.2RQ (2) Do different methods to carry out automatic analysis of language present a similar picture of complexity and language development?: How do the research methods in this paper complement each other? How does this complementarity inform language teaching?
- 6.Conclusion and some limitations
-
Notes
-
References
-
Appendix
References
Aguado-Jiménez, Pilar, Pérez-Paredes, Pascual & Sánchez, Purificación
2012 Exploring the use of multidimensional analysis of learner language to promote register awareness.
System 40(1): 90–103.


Alexopoulou, Theodora, Michel, Marije Cornelie, Murakami, Akira & Meurers, Detmar
2017 Task effects on linguistic complexity and accuracy: A large-scale learner corpus analysis employing natural language processing techniques.
Language Learning 67(S1): 180–208.


Biber, Douglas, Gray, Bethany & Poonpon, Kornwipa
2011 Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? TESOL Quarterly 45(1): 5–35.


Boulton, Alex
2009 Testing the limits of data-driven learning: Language proficiency and training.
ReCALL 21(1): 37–54.


Bulté, Bram & Housen, Alex
2012 Defining and operationalising L2 complexity,
Alex Housen,
Folkert Kuiken &
Ineke Vedder (eds),
Dimensions of L2 Performance and Proficiency. Complexity, Accuracy and Fluency in SLA [
Language Learning & Language Teaching 32], 21–46. Amsterdam: John Benjamins.


Bulté, Bram & Housen, Alex
2014 Conceptualizing and measuring short-term changes in L2 writing complexity.
Journal of Second Language Writing 26: 42–65.


Byrnes, Heidi & Sinicrope, Castle
2008 Advancedness and the development of relativization in L2 German: A curriculum-based longitudinal study,
Lourdes Ortega &
Heidi Byrnes (eds),
The Longitudinal Study of Advanced L2 Capacities, 109–138. New York NY: Routledge.

Carlsen, Cecilie
2012 Proficiency level – a fuzzy variable in computer learner corpora.
Applied Linguistics 33(2): 161–183.


Chen, Danqi & Manning, Christopher
2014 A Fast and Accurate Dependency Parser using Neural Networks,
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 740–750. Doha, Qatar: Association for Computational Linguistics.


Díez-Bedmar, María Belén
2010a Analysis of the Written Expression in English in the University Entrance Examination at the University of Jaén. PhD dissertation, Universidad de Jaén.

Díez-Bedmar, María Belén
2010b From secondary school to university: The Use of the English article system by Spanish learners. In
Exploring Corpus Linguistics in English Language Teaching,
Begoña Belles-Fortuno,
Mari Carmen Campoy &
Lluisa Gea-Valor (eds), 45–55. Castelló: Publicacions de la Universitat Jaume I.

Díez-Bedmar, María Belén
2012 The use of the common European framework of reference for languages to evaluate compositions in the English exam section of the university admission examination.
Revista de Educación 357: 55–79.

Díez-Bedmar, María Belén & Papp, Szilvia
2008 The use of the English article system by Chinese and Spanish learners. In
Linking up Contrastive and Learner Corpus Research,
Gaëtanelle Gilquin,
Szilvia Papp &
María Belén Díez-Bedmar (eds), 147–175. Amsterdam: Rodopi.


Díez-Bedmar, María Belén & Pérez Paredes, Pascual
Ellis, Nick C. O’Donnell, Matthew Brook & Römer, Ute
2013 Usage-based language: Investigating the latent structures that underpin acquisition.
Language Learning 63(s1): 25–51.


Ellis, Nick C., Römer, Ute & O’Donnell, Matthew Brook
2016 Usage-based approaches to language acquisition and processing: cognitive and corpus investigations of construction grammar. Malden, MA: Wiley.

Foster, Pauline & Tavakoli, Parvaneh
2009 Native speakers and task performance: Comparing effects on complexity, fluency, and lexical diversity.
Language Learning 59(4): 866–896.


Gablasova, Dana, Brezina, Vaclav & McEnery, Tony
2017 Exploring learner language through corpora: Comparing and interpreting corpus frequency information.
Language Learning 67(S1):130–154.


Granger, Sylviane, Dagneaux, Estelle, Meunier, Fanny & Paquot, Magali
2009 The International Corpus of Learner English,
Version 2. Handbook and CD-ROM. Louvain-la-Neuve: Presses Universitaires de Louvain.

Hawkins, John A. & Filipović, Luna
2012 Criterial Features in L2 English. Cambridge: CUP.

Ionin, Tania & Díez-Bedmar, María Belén
Forthcoming.
Article use in Russian and Spanish learner writing at CEFR B1 and B2 levels: effects of proficiency, native language, and specificity, Bert.
S. W. Le Brun &
Magali Paquot eds
Learner Corpora and Second Language Acquisition Cambridge CUP
Kyle, Kris
2016 Measuring syntactic development in L2 writing: Fine Grained Indices of Syntactic Complexity and Usage-based Indices of Syntactic Sophistication. PhD Dissertation, Georgia State University.
[URL]> (
24 March 2017).
Lu, Xiaofei
2011 A corpus-based evaluation of syntactic complexity measures as indices of college level ESL writers’ language development.
TESOL Quarterly 45(1): 36–62.


Norris, John M. & Ortega, Lourdes
2009 Towards an organic approach to investigating CAF in instructed SLA: The case of complexity.
Applied Linguistics 30(4): 555–578.


Ortega, Lourdes
2003 Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing.
Applied Linguistics 24(4): 492–518.


Pendar, Nick & Chapelle, Carol A. A.
2008 Investigating the promise of learner corpora: Methodological issues.
CALICO Journal 25: 189–206.


Pérez Paredes, Pascual & Díez-Bedmar, María Belén
2012 Intensifying adverbs in learner writing. In
Developmental and Crosslinguistic Perspectives in Learner Corpus Research [
Tokyo University of Foreign Studies 4],
Yukio Tono,
Yuji Kawaguchi &
Makoto Minegishi (eds), 105–123. Amsterdam: John Benjamins.


Pérez-Paredes, Pascual & Sánchez-Tornel, María
Pérez-Paredes, Pascual, Guillamón, Carlos & Aguado, Pilar
2018 Language teachers’ perceptions on the use of OER language processing technologies in MALL.
Computer Assisted Language Learning.


Rayson, Paul
2009 Wmatrix: A Web-based Corpus-processing Environment. Computing Department, Lancaster University.
[URL]> (
1 February 2016).
Robinson, Peter, Mackey, Alison, Gass, Susan & Schmidt, Richard
2012 Attention and awareness in second language acquisition. In
The Routledge Handbook of Second Language Acquisition,
Susan Gass &
Alison Mackey (eds), 247–267. New York NY: Routledge.

Schmidt, Richard
1990 The role of consciousness in second language learning.
Applied Linguistics 11: 129–158.


Tomasello, Michael
2003 Constructing a Language: A Usage-based Approach to Child Language Acquisition. Cambridge MA: Harvard University Press.

Tono, Yukio & Díez-Bedmar, María Belén
van Rijn, Jacolien, van Rijn, Hedderik & Hendriks, Petra
2012 How WM load influences pronoun interpretation. In
Proceedings of the 11th International Conference on Cognitive Modeling,
Nele Rußwinkel,
Uwe Drewitz &
Hedderick van Rijn (eds), 101–102. Berlin: Universitaetsverlag der TU Berlin.

van Rooy, Bertus & Schäfer, Lade
2002 The effect of leavener errors on pos tag errors during automatic POS tagging.
Southern African Linguistic and Applied Language Studies, 20(4), 325–335.


Verspoor, Marjolijn, Lowie, Wander & Van Dijk, Marijn
2008 Variability in second language development from a dynamic systems perspective.
The Modern Language Journal 92(2): 214–231.


Vyatkina, Nina
2012 The development of second language writing complexity in groups and individuals: A longitudinal learner corpus study.
The Modern Language Journal 96(4): 576–598.


Vyatkina, Nina
2013 Specific syntactic complexity: Developmental profiling of individuals based on an annotated learner corpus.
The Modern Language Journal 97(S1): 11–30.


Wolfe-Quintero, Kate, Inagaki, Shunki & Kim, Hae-Young
1998 Second Language Development in Writing: Measures of Fluency, Accuracy & Complexity. Honolulu HI: University of Hawaii Press.

Cited by
Cited by 2 other publications
Blanco-Suárez, Zeltia, Francisco Gallardo-del-Puerto & Evelyn Gandón-Chapela
2020.
The Primary Education Learners’ English Corpus (PELEC): Design and compilation.
Research in Corpus Linguistics 8
► pp. 147 ff.

Picoral, Adriana, Shelley Staples & Randi Reppen
This list is based on CrossRef data as of 1 november 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.