In Learner Corpus Research (LCR), a common source of errors stems from manual coding and annotation of linguistic
features. To estimate the amount of error present in a coded dataset, coefficients of inter-rater reliability are used. However, despite
the importance of reliability and internal consistency for validity and, by extension, study quality, interpretability and generalizability,
it is surprisingly uncommon for studies in the field of LCR to report on such reliability coefficients. In this Methods Report, we use a
recent collaborative research project to illustrate the pertinence of considering inter-rater reliability. In doing so, we hope to initiate
methodological discussion on instrument design, piloting and evaluation. We also suggest some ways forward to encourage increased
transparency in reporting practices.
Andreu-Andrés, M., Astor-Guardiola, A., Boquera-Matarredona, M., Macdonald, P., Montero-Fleta, B., & Pérez-Sabater, C. (2010). Analysing EFL learner output in the MiLC project: An error it’s*, but which tag?. In M. C. Campoy-Cubillo, B. Bellés-Fortuño, & M. Ll. Gea-Valor (Eds.), Corpus-based approaches to English language teaching (pp. 167–188). London: Continuum.
Artstein, R. (2017). Inter-annotator agreement. In N. Ide & J. Pustejovsky (Eds.), Handbook of linguistic annotation (pp. 297–313). New York, NY: Springer.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 201, 37–46.
Collentine, K. (2009). Learner use of holistic language units in task-based synchronous computer-mediated communication. Language Learning & Technology, 131, 67–87.
Derrick, D. (2015). Instrument reporting practices in second language research. TESOL Quarterly, 50(1), 132–153.
Díez-Bedmar, M. B. (2015). Dealing with errors in learner corpora to describe, teach and assess EFL writing: Focus on article use. In E. Castello, K. Ackerley, & F. Coccetta (Eds.), Studies in Learner Corpus Linguistics: Research and applications for foreign language teaching and assessment (pp. 37–69). Bern: Peter Lang.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.
Gamer, M., Lemon, J., Fellows, I., & Singh, P. (2012). irr: Various coefficients of interrater reliability and agreement. R package version 0.84.
Hallgren, K. (2012). Computing inter-rater reliability for observational data: An overview and tutorial. Tutorials in Quantitative Methods for Psychology, 8(1), 23–34.
Hasselgård, H. (2010). Adjunct adverbials in English. Cambridge: Cambridge University Press.
Johnson, R. L., Penny, J., & Gordon, B. (2010). The relation between score resolution methods and interrater reliability: An empirical study of an analytic scoring rubric. Applied Measurement in Education, 13(2), 121–138.
Kutuk, G., Putwain, D. W., Kaye, L., & Garrett, B. (in press). Development and validation of a new multidimensional language class anxiety scale. Journal of Psychoeducational Assessment.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 331, 159–174.
Larsson, T. (2018). Is there a correlation between form and function? A syntactic and functional investigation of the introductory it pattern in student writing. ICAME Journal, 42(1), 13–40.
Larson-Hall, J., & Plonsky, L. (2015). Reporting and interpreting quantitative research findings: What gets reported and recommendations for the field. Language Learning, 65(Suppl. 1), 127–159.
Loewen, S., & Plonsky, L. (2015). An A–Z of applied linguistics research methods. New York, NY: Palgrave.
Lüdeling, A., & Hirschmann, H. (2015). Error annotation systems. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 135–157). Cambridge: Cambridge University Press.
McKay, T., & Plonsky, L. (in press). Reliability analyses: Estimating error in L2 research. In P. Winke & T. Brunfaut (Eds.), The Routledge handbook of second language acquisition and language testing. New York, NY: Routledge.
Morgan, G. B., Zhu, M., Johnson, R. L., & Hodge, K. J. (2014). Interrater reliability estimators commonly used in scoring language assessments: A Monte Carlo investigation of estimator accuracy. Language Assessment Quarterly, 111, 304–324.
Norris, J. M., Plonsky, L., Ross, S. J., & Schoonen, R. (2015). Guidelines for reporting quantitative methods and results in primary research. Language Learning, 65(2), 470–476.
Osborne, J. (2003). Effect sizes and the disattenuation of correlation and regression coefficients: Lessons from educational psychology. Practical Assessment, Research, & Evaluation, 8(11). Retrieved from [URL]
Paquot, M., Hasselgård, H., & Oksefjell Ebeling, S. (2013). Writer/reader visibility in learner writing across genres: A comparison of the French and Norwegian components of the ICLE and VESPA learner corpora. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Twenty years of Learner Corpus Research: Looking back, moving ahead. Proceedings of the first Learner Corpus Research Conference (LCR 2011) (pp. 377–387). Louvain-la-Neuve: Presses Universitaires de Louvain.
Paquot, M., Grafmiller, J., & Szmrecsanyi, B. (2019). Particle placement alternation in EFL learner vs. L1 speech: Assessing the similarity of probabilistic grammars. In A. Abel, A. Glaznieks, V. Lyding, & L. Nicolas (Eds.), Widening the scope of learner corpus research: Selected papers from the fourth Learner Corpus Research Conference (pp. 71–92). Louvain-la-Neuve: Presses universitaires de Louvain.
Plonsky, L. (2013). Study quality in SLA: An assessment of designs, analyses, and reporting practices in quantitative L2 research. Studies in Second Language Acquisition, 351, 655–687.
Plonsky, L., & Derrick, D. J. (2016). A meta-analysis of reliability coefficients in second language research. Modern Language Journal, 1001, 538–553.
Polio, C., & Shea, M. (2014). An investigation into current measures of linguistic accuracy in second language writing research. Journal of Second Language Writing, 26(1), 10–27.
Purpura, J., Brown, J. D., & Schoonen, R. (2015). Improving the validity of quantitative measures in applied linguistics research. Language Learning, 65(Suppl. 1), 37–75.
Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A comprehensive grammar of the English language. London: Longman.
R Core Team. (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from [URL]
Révész, A. (2012). Coding second language data validly and reliably. In A. Mackey & S. Gass (Eds.), Research methods in Second Language Acquisition: A practical guide (pp. 203–221). Hoboken, NJ: Wiley-Blackwell.
Rose, Y., & MacWhinney, B. (2014). The PhonBank Project: Data and software-assisted methods for the study of phonology and phonological development. In J. Durand, U. Gut, & G. Kristoffersen (Eds.), The Oxford handbook of corpus phonology (pp. 380–401). Oxford: Oxford University Press.
Rosen, A., Hana, J., Stindlova, B., & Feldman, A. (2014). Evaluating and automating the annotation of a learner corpus. Language Resources and Evaluation, 481, 65–92.
Sim, J., & Wright, C. C. (2005). The Kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy, 85(3), 257–268.
Spooren, W., & Degand, L. (2010). Coding coherence relations: Reliability and validity. Corpus Linguistics and Linguistic Theory, 6(2), 241–266.
Trafimow, D. (2017). The attenuation of correlation coefficients: A statistical literacy issue. Teaching Statistics, 381, 25–28.
2024. A typology of secondary research in Applied Linguistics. Applied Linguistics Review 15:4 ► pp. 1569 ff.
Kim, Minjin, Xixin Qiu & Yuanheng (Arthur) Wang
2024. Interrater agreement in genre analysis: A methodological review and a comparison of three measures. Research Methods in Applied Linguistics 3:1 ► pp. 100097 ff.
Listanti, Andrea & Jacopo Torregrossa
2024. The development of postverbal subjects in L2 Italian: A multifactorial corpus analysis. Applied Psycholinguistics 45:1 ► pp. 180 ff.
Minnillo, Sophia, Claudia Sánchez-Gutiérrez, Ana Ruiz-Alonso-Bartol, Emily Morgan & Carmen González Gómez
2024. Learner corpus research: a critical appraisal and roadmap for contributing (more) to SLA research agendas. Corpus Linguistics and Linguistic Theory
Song, Yingming & Jiajin Xu
2024. Variation in phrase frame structure and function in argumentative writing by EFL learners across different L1 backgrounds. International Journal of Applied Linguistics
Hober, Nicole, Tülay Dixon & Tove Larsson
2023. Towards increased reliability and transparency in projects with manual linguistic coding. Corpora 18:2 ► pp. 245 ff.
Love, Robbie & Anna-Brita Stenstrom
2023. Corpus-pragmatic perspectives on the contemporary weakening of fuck: The case of teenage British English conversation. Journal of Pragmatics 216 ► pp. 167 ff.
Rygg, Kristin & Stine Hulleberg Johansen
2023. When the Norwegian ‘politeness marker’ vennligst becomes impolite. Journal of Politeness Research 19:2 ► pp. 439 ff.
2022. Perceptual chunking of spontaneous speech: Validating a new method with non-native listeners. Research Methods in Applied Linguistics 1:2 ► pp. 100012 ff.
Larsson, Tove, Luke Plonsky & Gregory R. Hancock
2021. On the benefits of structural equation modeling for corpus linguists. Corpus Linguistics and Linguistic Theory 17:3 ► pp. 683 ff.
This list is based on CrossRef data as of 17 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.