Researching learner language through POS keyword and syntactic complexity analyses
In this paper, we explore the affordances of two different research methods that may be instrumental in analysing learner language complexity: standard corpus linguistics methodology and automatic syntactic complexity analysers. Our results suggest that POS keyword analysis and automatic syntactic analysis are both effective for the identification of linguistic features at different levels of development in instructed SLA. In particular, countable nouns, prepositional phrases, verbs and general adverbs are criterial features that define the transition from lower to higher secondary school language learning in the Spanish component of the ICCI corpus. We suggest that the analysis of complexity in noun phrases is of great interest for researchers and teachers in terms of identifying development milestones in language acquisition.
Article outline
- 1.Introduction
- 2.Research methodology
- 2.1Data
- 2.2Research methods
- 3.Contrasting learner corpora (1): POS keyword analysis
- 3.1Grades 7 and 8
- 3.2Grades 11 and 12
- 3.3Grades 7 and 8 vs Grades 11 and 12
- 4.Contrasting learner corpora (2): Automatic syntactic complexity analysis
- 4.1Grades 7, 8, 11 and 12: Complexity in the noun phrase
- 4.2Grades 7, 8, 11 and 12: Syntactic sophistication
- 4.2.1Traditional measures of syntactic complexity
- 4.2.2Measures of syntactic sophistication
- 4.3Grades 7 and 8 vs Grades 11 and 12: Complexity in the noun phrase and syntactic sophistication measures
- 5.Discussion and pedagogical implications
- 5.1RQ (1) Do different groups of learners present distinct linguistic features? Can these features be identified by means of automatic analysis of language?
- 5.2RQ (2) Do different methods to carry out automatic analysis of language present a similar picture of complexity and language development? How do the research methods in this paper complement each other? How does this complementarity inform language teaching?
- 6.Conclusion and some limitations
-
Notes
-
References
-
Appendix
References (44)
References
Aguado-Jiménez, Pilar, Pérez-Paredes, Pascual & Sánchez, Purificación. 2012. Exploring the use of multidimensional analysis of learner language to promote register awareness. System 40(1): 90–103.
Alexopoulou, Theodora, Michel, Marije Cornelie, Murakami, Akira & Meurers, Detmar. 2017. Task effects on linguistic complexity and accuracy: A large-scale learner corpus analysis employing natural language processing techniques. Language Learning 67(S1): 180–208.
Biber, Douglas, Gray, Bethany & Poonpon, Kornwipa. 2011. Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? TESOL Quarterly 45(1): 5–35.
Boulton, Alex. 2009. Testing the limits of data-driven learning: Language proficiency and training. ReCALL 21(1): 37–54.
Bulté, Bram & Housen, Alex. 2012. Defining and operationalising L2 complexity, Alex Housen, Folkert Kuiken & Ineke Vedder (eds), Dimensions of L2 Performance and Proficiency. Complexity, Accuracy and Fluency in SLA [Language Learning & Language Teaching 32], 21–46. Amsterdam: John Benjamins.
Bulté, Bram & Housen, Alex. 2014. Conceptualizing and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing 26: 42–65.
Byrnes, Heidi & Sinicrope, Castle. 2008. Advancedness and the development of relativization in L2 German: A curriculum-based longitudinal study, Lourdes Ortega & Heidi Byrnes (eds), The Longitudinal Study of Advanced L2 Capacities, 109–138. New York NY: Routledge.
Carlsen, Cecilie. 2012. Proficiency level – a fuzzy variable in computer learner corpora. Applied Linguistics 33(2): 161–183.
Chen, Danqi & Manning, Christopher. 2014. A Fast and Accurate Dependency Parser using Neural Networks, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 740–750. Doha, Qatar: Association for Computational Linguistics.
Díez-Bedmar, María Belén. 2010a. Analysis of the Written Expression in English in the University Entrance Examination at the University of Jaén. PhD dissertation, Universidad de Jaén.
Díez-Bedmar, María Belén. 2010b. From secondary school to university: The Use of the English article system by Spanish learners. In Exploring Corpus Linguistics in English Language Teaching, Begoña Belles-Fortuno, Mari Carmen Campoy & Lluisa Gea-Valor (eds), 45–55. Castelló: Publicacions de la Universitat Jaume I.
Díez-Bedmar, María Belén. 2012. The use of the common European framework of reference for languages to evaluate compositions in the English exam section of the university admission examination. Revista de Educación 357: 55–79.
Díez-Bedmar, María Belén & Papp, Szilvia. 2008. The use of the English article system by Chinese and Spanish learners. In Linking up Contrastive and Learner Corpus Research, Gaëtanelle Gilquin, Szilvia Papp & María Belén Díez-Bedmar (eds), 147–175. Amsterdam: Rodopi.
Ellis, Nick C. O’Donnell, Matthew Brook & Römer, Ute. 2013. Usage-based language: Investigating the latent structures that underpin acquisition. Language Learning 63(s1): 25–51.
Ellis, Nick C., Römer, Ute & O’Donnell, Matthew Brook. 2016. Usage-based approaches to language acquisition and processing: cognitive and corpus investigations of construction grammar. Malden, MA: Wiley.
Foster, Pauline & Tavakoli, Parvaneh. 2009. Native speakers and task performance: Comparing effects on complexity, fluency, and lexical diversity. Language Learning 59(4): 866–896.
Gablasova, Dana, Brezina, Vaclav & McEnery, Tony. 2017. Exploring learner language through corpora: Comparing and interpreting corpus frequency information. Language Learning 67(S1):130–154.
Granger, Sylviane, Dagneaux, Estelle, Meunier, Fanny & Paquot, Magali. 2009. The International Corpus of Learner English, Version 2. Handbook and CD-ROM. Louvain-la-Neuve: Presses Universitaires de Louvain.
Hawkins, John A. & Filipović, Luna. 2012. Criterial Features in L2 English. Cambridge: CUP.
Ionin, Tania & Díez-Bedmar, María Belén. Forthcoming. Article use in Russian and Spanish learner writing at CEFR B1 and B2 levels: effects of proficiency, native language, and specificity, Bert. S. W. Le Brun & Magali Paquot (eds). Learner Corpora and Second Language Acquisition. Cambridge: CUP.
Kyle, Kris. 2016. Measuring syntactic development in L2 writing: Fine Grained Indices of Syntactic Complexity and Usage-based Indices of Syntactic Sophistication. PhD Dissertation, Georgia State University. <[URL]> (24 March 2017).
Lu, Xiaofei. 2011. A corpus-based evaluation of syntactic complexity measures as indices of college level ESL writers’ language development. TESOL Quarterly 45(1): 36–62.
Norris, John M. & Ortega, Lourdes. 2009. Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics 30(4): 555–578.
Ortega, Lourdes. 2003. Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics 24(4): 492–518.
Pendar, Nick & Chapelle, Carol A. A. 2008. Investigating the promise of learner corpora: Methodological issues. CALICO Journal 25: 189–206.
Pérez Paredes, Pascual & Díez-Bedmar, María Belén. 2012. Intensifying adverbs in learner writing. In Developmental and Crosslinguistic Perspectives in Learner Corpus Research [Tokyo University of Foreign Studies 4], Yukio Tono, Yuji Kawaguchi & Makoto Minegishi (eds), 105–123. Amsterdam: John Benjamins.
Pérez-Paredes, Pascual, Guillamón, Carlos & Aguado, Pilar. 2018. Language teachers’ perceptions on the use of OER language processing technologies in MALL. Computer Assisted Language Learning.
Rayson, Paul. 2009. Wmatrix: A Web-based Corpus-processing Environment. Computing Department, Lancaster University. <[URL]> (1 February 2016).
Robinson, Peter, Mackey, Alison, Gass, Susan & Schmidt, Richard. 2012. Attention and awareness in second language acquisition. In The Routledge Handbook of Second Language Acquisition, Susan Gass & Alison Mackey (eds), 247–267. New York NY: Routledge.
Schmidt, Richard. 1990. The role of consciousness in second language learning. Applied Linguistics 11: 129–158.
Tomasello, Michael. 2003. Constructing a Language: A Usage-based Approach to Child Language Acquisition. Cambridge MA: Harvard University Press.
van Rijn, Jacolien, van Rijn, Hedderik & Hendriks, Petra. 2012. How WM load influences pronoun interpretation. In Proceedings of the 11th International Conference on Cognitive Modeling, Nele Rußwinkel, Uwe Drewitz & Hedderick van Rijn (eds), 101–102. Berlin: Universitaetsverlag der TU Berlin.
van Rooy, Bertus & Schäfer, Lade. 2002. The effect of leavener errors on pos tag errors during automatic POS tagging. Southern African Linguistic and Applied Language Studies, 20(4), 325–335.
Verspoor, Marjolijn, Lowie, Wander & Van Dijk, Marijn. 2008. Variability in second language development from a dynamic systems perspective. The Modern Language Journal 92(2): 214–231.
Vyatkina, Nina. 2012. The development of second language writing complexity in groups and individuals: A longitudinal learner corpus study. The Modern Language Journal 96(4): 576–598.
Vyatkina, Nina. 2013. Specific syntactic complexity: Developmental profiling of individuals based on an annotated learner corpus. The Modern Language Journal 97(S1): 11–30.
Wolfe-Quintero, Kate, Inagaki, Shunki & Kim, Hae-Young. 1998. Second Language Development in Writing: Measures of Fluency, Accuracy & Complexity. Honolulu HI: University of Hawaii Press.
Cited by (3)
Cited by three other publications
Lim, Joyce Dong Ok, Geraldine Mark, Pascual Pérez-Paredes & Anne O’Keeffe
2024.
Exploring part of speech (pos) tag sequences in a large-scale learner corpus of L2 English: a developmental perspective.
Corpora 19:1
► pp. 31 ff.
Picoral, Adriana, Shelley Staples & Randi Reppen
Blanco-Suárez, Zeltia, Francisco Gallardo-del-Puerto & Evelyn Gandón-Chapela
2020.
The Primary Education Learners’ English Corpus (PELEC): Design and compilation.
Research in Corpus Linguistics 8
► pp. 147 ff.
This list is based on CrossRef data as of 23 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.