Edited by Kristopher Kyle
[International Journal of Learner Corpus Research 7:1] 2021
► pp. 17–52
Automated annotation of learner English
An evaluation of software tools
This paper explores the use of natural language processing (NLP) tools and their utility for learner language analyses through a comparison of automatic linguistic annotation against a gold standard produced by humans. While there are a number of automated annotation tools for English currently available, little research is available on the accuracy of these tools when annotating learner data. We compare the performance of three linguistic annotation tools (a tagger and two parsers) on academic writing in English produced by learners (both L1 and L2 English speakers). We focus on lexico-grammatical patterns, including both phrasal and clausal features, since these are frequently investigated in applied linguistics studies. Our results report both precision and recall of annotation output for argumentative texts in English across four L1s: Arabic, Chinese, English, and Korean. We close with a discussion of the benefits and drawbacks of using automatic tools to annotate learner language.
Article outline
- 1.Introduction
- 2.Lexico-grammatical patterns in L2 English academic writing
- 2.1Performance of automated annotation
- 3.Tool performance evaluation
- 3.1Methods
- 3.1.1English academic writing corpus
- 3.1.2Choice of automatic annotation tools
- 3.1.3Gold standard labels
- 3.1.4Feature extraction
- 3.1.4.1Attributive adjectives
- 3.1.4.2Noun-noun sequences
- 3.1.4.3Relative clause
- 3.1.4.4Complement clause
- 3.1.5Output alignment
- 3.1.6Analysis
- 3.2Results
- 3.2.1Phrasal features
- 3.2.2Clausal features
- 3.1Methods
- 4.Discussion and conclusion
- Acknowledgements
-
References
https://doi.org/10.1075/ijlcr.20003.pic