Current syntactic annotation of large-scale learner corpora mainly resorts to “standard parsers” trained on native language data. Understanding how these parsers perform on learner data is important for downstream research and application related to learner language. This study evaluates the performance of multiple standard probabilistic parsers on learner English. Our contributions are three-fold. Firstly, we demonstrate that the common practice of constructing a gold standard – by manually correcting the pre-annotation of a single parser – can introduce bias to parser evaluation. We propose an alternative annotation method which can control for the annotation bias. Secondly, we quantify the influence of learner errors on parsing errors, and identify the learner errors that impact on parsing most. Finally, we compare the performance of the parsers on learner English and native English. Our results have useful implications on how to select a standard parser for learner English.
Berzak, Y., Huang, Y., Barbu, A., Korhonen, A., & Katz, B. (2016a). Anchoring and agreement in syntactic annotations. In J. Su (Ed.), Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2215–2224). Austin, TX: ACL.
Berzak, Y., Kenney, J., Spadine, C., Wang, J. X., Lam, L., Mori, K. S., Garza, S., & Katz, B. (2016b). Universal dependencies for learner English. In K. Erk & N. A. Smith (Eds.), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 737–746). Berlin: ACL.
Buchholz, S., & Marsi, E. (2006). CoNLL-X shared task on multilingual dependency parsing. In L. Marquez & D. Klein (Eds.), Proceedings of the Tenth Conference on Computational Natural Language Learning (pp. 149–164). New York, NY: ACL.
Cer, D. M., De Marneffe, M. -C., Jurafsky, D., & Manning, C. D. (2010). Parsing to Stanford dependencies: Trade-offs between speed and accuracy. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner & D. Tapias (Eds.), Proceedings of the Seventh International Conference on Language Resources and Evaluation (pp. 1628–1632). Valletta: ELRA.
Charniak, E., & Johnson, M. (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In K. Knight (Ed.), Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 173–180). Stroudsburg: ACL.
Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: Cambridge University Press.
De Marneffe, M. -C., MacCartney, B., & Manning, C. D. (2006). Generating typed dependency parses from phrase structure parses. In N. Calzolari, K. Choukri, A. Gangemi, B. Maegaard, J. Mariani, J. Odijk & D. Tapias (Eds.), Proceedings of the Fifth International Conference on Language Resources and Evaluation (pp. 449–454). Genoa: ELRA.
De Marneffe, M. -C., & Manning, C. D. (2008). Stanford typed dependencies manual (Technical Report). Retrieved from [URL] (last accessed February 2018).
Dickinson, M., & Lee, C. M. (2013). Modifying corpus annotation to support the analysis of learner language. CALICO Journal, 26(3), 545–561.
Dickinson, M., & Ragheb, M. (2009). Dependency annotation for learner corpora. In M. Passarotti, A. Przepiorkowski, S. Raynaud & F. Van Eynde (Eds.), Proceedings of the Eighth Workshop on Treebanks and Linguistic Theories (pp. 59–70). Milan: EDUCatt.
Geertzen, J., Alexopoulou, T., & Korhonen, A. (2013). Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCAMDAT). In R. T. Miller, K. I. Martin, C. M. Eddington, A. Henery, N. M. Miguel, A. Tseng, A. Tuninetti & D. Walter (Eds.), Proceedings of the 31st Second Language Research Forum: Building Bridges Between Disciplines. Somerville: Cascadilla Proceedings Project.
Granger, S., Dagneaux, E., Meunier, F., & Paquot, M. (2009). The International Corpus of Learner English. Version 2. Handbook and CD-ROM. Louvain-la-Neuve: Presses Universitaires de Louvain.
James, C. (2013). Errors in Language Learning and Use: Exploring Error Analysis. New York, NY: Addison Wesley Longman.
Klein, D., & Manning, C. D. (2003a). Accurate unlexicalized parsing. In E. W. Hinrichs & D. Roth (Eds.), Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1 (pp. 423–430). Sapporo: ACL.
Klein, D., & Manning, C. D. (2003b). Fast exact inference with a factored model for natural language parsing. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in Neural Information Processing Systems 15 (pp. 3–10). Cambridge, MA: MIT Press.
Kong, L., & Smith, N. A. (2014). An empirical comparison of parsing methods for stanford dependencies (arXiv preprint). Retrieved from [URL] (last accessed February 2018).
Korhonen, A. (2002). Semantically motivated subcategorization acquisition. In J. Pentheroudakis, N. Calzolari & A. Wu (Eds.), Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition-Volume 9 (pp. 51–58). Philadelphia, PA: ACL.
Krivanek, J., & Meurers, D. (2011). Comparing rule-based and data-driven dependency parsing of learner language. In K. Gerdes, E. Hajičová & L. Wanner (Eds.), Proceedings of the First International Conference on Dependency Linguistics (Depling 2011) (pp. 310–317). Barcelona: IOS Press.
Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Martins, A. F. T., Almeida, M., & Smith, N. A. (2013). Turning on the Turbo: Fast third-order non-projective Turbo parsers. In H. Schuetze (Ed.), Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 617–622). Sofia: ACL.
Nicholls, D. (2003). The Cambridge Learner Corpus: Error coding and analysis for lexicography and ELT. In A. Dawn, P. Rayson, A. Wilson & T. McEnery (Eds.), Proceedings of the Corpus Linguistics 2003 Conference (pp. 572–581). Lancaster: UCREL.
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., & Marsi, E. (2007). MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95–135.
Ott, N., & Ziai, R. (2010). Evaluating dependency parsing performance on German learner language. In M. Dickinson, K. Müürisep & M. Passarotti (Eds.), Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories (pp. 175–186). Tartu: NEALT.
Petrov, S., & Klein, D. (2007). Improved inference for unlexicalized parsing. In B. Carpenter, A. Stent & J. D. Williams (Eds.), Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL) (pp. 404–411). Rochester: ACL.
Ragheb, M., & Dickinson, M. (2011). Avoiding the comparative fallacy in the annotation of learner corpora. In G. Granena, J. Koeth, S. Lee-Ellis, A. Lukyanchenko, G. P. Botana & E. Rhoades (Eds.), Selected Proceedings of the 2010 Second Language Research Forum: Reconsidering SLA Research, Dimensions, and Directions (pp. 114–124). Somerville, MA: Cascadilla Proceedings Project.
Ragheb, M., & Dickinson, M. (2013). Inter-annotator agreement for dependency annotation of learner language. In J. Tetreault, J. Burstein & C. Leacock (Eds.), Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 169–179). Atlanta, GA: ACL.
Rosen, A., Hana, J., Štindlová, B., & Feldman, A. (2014). Evaluating and automating the annotation of a learner corpus. Language Resources and Evaluation, 48(1), 65–92.
Santorini, B. (1990). Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision, 2nd printing) (Technical report). Retrieved from [URL] (last accessed February 2018).
2024. Utility of Kolmogorov complexity measures: Analysis of L2 groups and L1 backgrounds. PLOS ONE 19:4 ► pp. e0301806 ff.
Bannò, Stefano & Marco Matassoni
2024. Back to grammar: Using grammatical error correction to automatically assess L2 speaking proficiency. Speech Communication 157 ► pp. 103025 ff.
Kim, Minjin & Xiaofei Lu
2024. L2 English speaking syntactic complexity: Data preprocessing issues, reliability of automated analysis, and the effects of proficiency, L1 background, and topic. The Modern Language Journal 108:1 ► pp. 270 ff.
Kyle, Kristopher & Masaki Eguchi
2024. Evaluating NLP models with written and spoken L2 samples. Research Methods in Applied Linguistics 3:2 ► pp. 100120 ff.
Lestari, Febriana
2024. Analysis of verb argument constructions (VACs) in L2 learners across proficiency levels: A corpus-based study in L1 Indonesian. Applied Corpus Linguistics 4:3 ► pp. 100097 ff.
Ma, Hong, Jinglei Wang & Lianzhen He
2024. Linguistic Features Distinguishing Students’ Writing Ability Aligned with CEFR Levels. Applied Linguistics 45:4 ► pp. 637 ff.
2024. The potential influence of cross-linguistic lexical similarity on lexical diversity in L2 English writing. Corpora 19:2 ► pp. 131 ff.
Spina, Stefania, Irene Fioravanti, Luciana Forti & Fabio Zanda
2024. The CELI corpus: Design and linguistic annotation of a new online learner corpus. Second Language Research 40:2 ► pp. 457 ff.
Vercellotti, MaryLou & Sean Hall
2024. Coding all clauses in L2 data: A call for consistency. Research Methods in Applied Linguistics 3:3 ► pp. 100132 ff.
Xia, Detong, Mark A. Sulzer & Hye K. Pae
2024. Phrase-frames in business emails: a contrast between learners of business English and working professionals. Text & Talk 44:5 ► pp. 693 ff.
Yan, Hengbin & Yinghui Li
2024. Constraction: a tool for the automatic extraction and interactive exploration of linguistic constructions. Linguistics Vanguard 9:1 ► pp. 215 ff.
Berti, Barbara, Andrea Esuli & Fabrizio Sebastiani
2023. Unravelling interlanguage facts via explainable machine learning. Digital Scholarship in the Humanities 38:3 ► pp. 953 ff.
2023. Examining the potential influence of crosslinguistic lexical similarity on word-choice transfer in L2 English. PLOS ONE 18:2 ► pp. e0281137 ff.
Du, Xiangtao, Muhammad Afzaal & Hind Al Fadda
2022. Collocation Use in EFL Learners’ Writing Across Multiple Language Proficiencies: A Corpus-Driven Study. Frontiers in Psychology 13
Durrant, Philip
2022. Studying children's writing development with a corpus. Applied Corpus Linguistics 2:3 ► pp. 100026 ff.
Gaillat, Thomas, Andrew Simpkin, Nicolas Ballier, Bernardo Stearns, Annanda Sousa, Manon Bouyé & Manel Zarrouk
2022. Predicting CEFR levels in learners of English: The use of microsystem criterial features in a machine learning approach. ReCALL 34:2 ► pp. 130 ff.
McCallum, Lee & Philip Durrant
2022. Shaping Writing Grades,
Murakami, Akira & Nick C. Ellis
2022. Effects of Availability, Contingency, and Formulaicity on the Accuracy of English Grammatical Morphemes in Second Language Writing. Language Learning 72:4 ► pp. 899 ff.
Tan, Yi & Ute Römer
2022. Using phrase-frames to trace the language development of L1 Chinese learners of English. System 108 ► pp. 102844 ff.
Xia, Detong, Haiyang Ai & Hye K. Pae
2022. “Please let me know”. International Journal of Learner Corpus Research 8:1 ► pp. 1 ff.
2021. Automatic extraction of subordinate clauses and its application in second language acquisition research. Behavior Research Methods 53:2 ► pp. 803 ff.
Huang, Yan, Akira Murakami, Theodora Alexopoulou & Anna Korhonen
Ballier, Nicolas, Thomas Gaillat, Andrew Simpkin, Bernardo Stearns, Manon Bouyé & Manel Zarrouk
2019. A Supervised Learning Model for the Automatic Assessment of Language Levels Based on Learner Errors. In Transforming Learning with Meaningful Technologies [Lecture Notes in Computer Science, 11722], ► pp. 308 ff.
2022. Automated Essay Scoring [Synthesis Lectures on Human Language Technologies, ],
This list is based on CrossRef data as of 11 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.