Inter-rater reliability in Learner Corpus Research
Insights from a collaborative study on adverb placement
In Learner Corpus Research (LCR), a common source of errors stems from manual coding and annotation of linguistic features. To estimate the amount of error present in a coded dataset, coefficients of inter-rater reliability are used. However, despite the importance of reliability and internal consistency for validity and, by extension, study quality, interpretability and generalizability, it is surprisingly uncommon for studies in the field of LCR to report on such reliability coefficients. In this Methods Report, we use a recent collaborative research project to illustrate the pertinence of considering inter-rater reliability. In doing so, we hope to initiate methodological discussion on instrument design, piloting and evaluation. We also suggest some ways forward to encourage increased transparency in reporting practices.
Keywords: inter-rater reliability, coding errors, reporting practices, study quality, Fleiss’ kappa
Andreu-Andrés, M., Astor-Guardiola, A., Boquera-Matarredona, M., Macdonald, P., Montero-Fleta, B., & Pérez-Sabater, C.
Díez-Bedmar, M. B.
(2015) Dealing with errors in learner corpora to describe, teach and assess EFL writing: Focus on article use. In E. Castello, K. Ackerley, & F. Coccetta (Eds.), Studies in Learner Corpus Linguistics: Research and applications for foreign language teaching and assessment (pp. 37–69). Bern: Peter Lang.
Fleiss, J. L.
Gamer, M., Lemon, J., Fellows, I., & Singh, P.
Johnson, R. L., Penny, J., & Gordon, B.
Kutuk, G., Putwain, D. W., Kaye, L., & Garrett, B.
(in press). Development and validation of a new multidimensional language class anxiety scale. Journal of Psychoeducational Assessment.
Landis, J. R., & Koch, G. G.
Larsson, T., Callies, M., Hasselgård, H., Laso, N. J., Van Vuuren, S., Verdaguer, I., & Paquot, M.
Larson-Hall, J., & Plonsky, L.
Loewen, S., & Plonsky, L.
Lüdeling, A., & Hirschmann, H.
McKay, T., & Plonsky, L.
(in press). Reliability analyses: Estimating error in L2 research. In P. Winke & T. Brunfaut Eds. The Routledge handbook of second language acquisition and language testing. New York, NY: Routledge.
Morgan, G. B., Zhu, M., Johnson, R. L., & Hodge, K. J.
Norris, J. M., Plonsky, L., Ross, S. J., & Schoonen, R.
(2003) Effect sizes and the disattenuation of correlation and regression coefficients: Lessons from educational psychology. Practical Assessment, Research, & Evaluation, 8(11). Retrieved from https://pareonline.net/getvn.asp?v=8&n=11
Paquot, M., Hasselgård, H., & Oksefjell Ebeling, S.
(2013) Writer/reader visibility in learner writing across genres: A comparison of the French and Norwegian components of the ICLE and VESPA learner corpora. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Twenty years of Learner Corpus Research: Looking back, moving ahead. Proceedings of the first Learner Corpus Research Conference (LCR 2011) (pp. 377–387). Louvain-la-Neuve: Presses Universitaires de Louvain.
Paquot, M., Grafmiller, J., & Szmrecsanyi, B.
(2019) Particle placement alternation in EFL learner vs. L1 speech: Assessing the similarity of probabilistic grammars. In A. Abel, A. Glaznieks, V. Lyding, & L. Nicolas (Eds.), Widening the scope of learner corpus research: Selected papers from the fourth Learner Corpus Research Conference (pp. 71–92). Louvain-la-Neuve: Presses universitaires de Louvain.
Paquot, M., & Plonsky, L.
Plonsky, L., & Derrick, D. J.
Polio, C., & Shea, M.
Purpura, J., Brown, J. D., & Schoonen, R.
Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J.
R Core Team
(2018) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from https://www.R-project.org/
Rose, Y., & MacWhinney, B.
Rosen, A., Hana, J., Stindlova, B., & Feldman, A.
Sim, J., & Wright, C. C.
Spooren, W., & Degand, L.