Automatic error tagging of spelling mistakes in learner corpora

Rayson, Paul; Baron, Alistair

doi:10.1075/scl.45.09ray

Part of

A Taste for Corpora: In honour of Sylviane Granger
Edited by Fanny Meunier, Sylvie De Cock, Gaëtanelle Gilquin and Magali Paquot
[Studies in Corpus Linguistics 45] 2011
► pp. 109–126

Automatic error tagging of spelling mistakes in learner corpora

Paul Rayson

Alistair Baron

Manual error tagging of learner corpus data is time consuming and creates a bottleneck in the analysis of learner corpora. This had led researchers to apply techniques from the area of natural language processing to assist in the automatic analysis of such data. This chapter presents the novel application of a hybrid approach to the detection of spelling errors in learner data. The Variant Detector (VARD) software was developed to match historical spelling variants to modern equivalents with the intention of improving the accuracy and robustness of corpus linguistics techniques when applied to historical corpora. Here, we describe its application to detect spelling errors in written learner corpora consisting of 50,000 words from each of three learner backgrounds (French, German and Spanish).

Published online: 18 August 2011

https://doi.org/10.1075/scl.45.09ray

Cited by (5)

Cited by five other publications

Order by:

Claridge, Claudia

2022. Review of Elena Seoane and Douglas Biber eds. 2021. Corpus-based Approaches to Register Variation. Amsterdam: John Benjamins. ISBN: 978-9-027-21054-8. http://doi.org/10.1075/scl.103. Research in Corpus Linguistics 10:2 ► pp. 187 ff.

Calle-Martín, Javier

2021. A corpus-based study of abbreviations in early English medical writing. Research in Corpus Linguistics 9:2 ► pp. 114 ff.

Gilquin, Gaëtanelle

2020. Learner Corpora. In A Practical Handbook of Corpus Linguistics, ► pp. 283 ff.

Löfberg, Laura & Paul Rayson

2019. Developing Multilingual Automatic Semantic Annotation Systems. In Advances in Empirical Translation Studies, ► pp. 94 ff.

Smith, Catherine, Svenja Adolphs, Kevin Harvey & Louise Mullany

2014. Spelling errors and keywords in born-digital data: a case study using the Teenage Health Freak Corpus. Corpora 9:2 ► pp. 137 ff.

This list is based on CrossRef data as of 16 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.