Peer and teacher assessment of second-language writing in high- and low-stakes conditions

Rezaei, Amir; Barkaoui, Khaled

doi:10.1075/itl.20006.rez

Article published In:

ITL - International Journal of Applied Linguistics
Vol. 172:2 (2021) ► pp.199–228

Peer and teacher assessment of second-language writing in high- and low-stakes conditions

Amir Rezaei | York UniversityToronto, Canada

Khaled Barkaoui | York UniversityToronto, Canada

This study aimed to compare second-language (L2) students’ ratings of their peers’ essays on multiple criteria with those of their teachers’ under different assessment conditions. Forty EFL teachers and 40 EFL students took part in the study. They each rated one essay on five criteria twice, under high-stakes and low-stakes assessment conditions. Multifaceted Rasch Analysis and correlation analyses were conducted to compare rater severity and consistency across rater groups, rating criteria and assessment conditions. The results revealed that there was more variation in students’ ratings than the teachers’ across assessment conditions. Additionally, both rater groups had different degrees of severity in assessing different criteria. In general, students were significantly more severe on language use than were teachers; whereas teachers were significantly more severe than were peers on organization. Student and teacher severity also varied across rating criteria and assessment conditions. The findings of this study have implications for planning and implementing peer assessment in the L2 writing classroom as well as for future research.

Keywords: second language assessment, peer assessment, peer assessment accuracy, rater bias, high-stakes assessment, rating criteria, quality of peer assessment

Article outline

Introduction
Factors affecting the quality of PA
Uses of peer assessment
Studies on the severity/leniency of teacher and peer raters
The present study
- Method
- Data analysis
Findings
- Descriptive statistics and correlational analyses
- MFRM analyses
- MFRM interaction analyses
Discussion and implications
References

Published online: 18 November 2020

https://doi.org/10.1075/itl.20006.rez

References (44)

References

Bachman, L. F. (2004). Statistical analyses for language assessment. Ernst Klett Sprachen.

Bachman, L. F., Palmer, A. S. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford: Oxford University Press.

Baker, B. A. (2010). Playing with the stakes: A consideration of an aspect of the social context of a gatekeeping writing assessment. Assessing Writing, 15(3), 133–153.

Ballantyne, R., Hughes, K., & Mylonas, A. (2002). Developing procedures for imple-menting peer assessment in large classes using an action research process. Assessment & Evaluation in Higher Education, 271, 427–441.

Barkaoui, K. (2013). Multifaceted Rasch analysis for test evaluation. The companion to language assessment, 31, 1301–1322.

Biber, D., Nekrasova, T., & Horn, B. (2011). The effectiveness of feedback for L1-English and L2- writing development: A meta-analysis. ETS Research Report Series, 2011(1), i–99.

Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2004). Working inside the black box: Assessment for learning in the classroom. Phi delta kappan, 86(1), 8–21.

Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: principles, policy & practice, 5(1), 7–74.

Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (Eds.). (2008). Building a validity argument for the Test of English as a Foreign LanguageTM. Routledge.

Cheng, W. & Warren, M. (2005). Peer assessment of language proficiency. Language Testing, 22(3), 93–121.

Cho, K., Schunn, C. D., & Wilson, R. W. (2006). Validity and reliability of scaffolded peer assessment of writing from instructor and student perspectives. Journal of Educational Psychology, 981, 891–901.

De Ayala, R. J. (2009). The Theory and Practice of Item Response Theory. Psychometrika, 75(4), 778–779.

Esfandiari, R., & Myford, C. M. (2013). Severity differences among self-assessors, peer- assessors, and teacher assessors rating EFL essays. Assessing writing, 18(2), 111–131.

Falchikov, N. (1995). Peer feedback marking: Developing peer assessment. Innovations in Education and Training International, 321, 175–187.

(2005). Improving assessment through student involvement: Practical solutions for aiding learning in higher and further education. London: RoutledgeFalmer.

Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education: a meta-analysis comparing peer and teacher marks. Review of Educational Research, 70(3), 287–322.

Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79–101.

Gielen, S., Peeters, E., Dochy, F., Onghena, P., & Struyven, K. (2010). Improving the effectiveness of peer feedback for learning. Learning and instruction, 20(4), 304–315.

Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.

Jacobs, H. L., Zinkgraf, S. A., Wormouth, D. R., Hartfiel, V. F., & Hughey, J. B. (1981). Testing ESL composition: A practical approach. Rowely, MA: Newbury House.

Jeffery, D., Yankulov, K., Crerar, A., & Ritchie, K. (2016). How to achieve accurate peer assessment for high value written assignments in a senior undergraduate course. Assessment & Evaluation in Higher Education, 411, 127–140.

Kearney, S. P., & Perkins, T. (2014). Engaging students through assessment: the success and limitations of the ASPAL (authentic self and peer-assessment for learning) model. Journal of University Teaching and Learning Practice, 11 (3), 1–13.

Kearney, S., Perkins, T. & Clark, S. K. (2016). Using self- and peer-assessments for summative purposes: analysing the relative validity of the AASL (authentic assessment for sustainable learning) model. Assessment & Evaluation in Higher Education, 41 (6), 840–853.

Lamb, T. E. R. R. Y. (2010). Assessment of autonomy or assessment for autonomy? Evaluating learner autonomy for formative purposes. Testing the untestable in language education, 98–119.

Lee, S. B. (2016). University students’ experience of ‘scale-referenced’ peer assessment for a consecutive interpreting examination. Assessment and Evaluation in Higher Education, 411, 1–15.

Linacre, J. M. (2002). What do infit and outfit, mean-square and standardized mean? Rasch Measurement Transactions, 16(2), 878.

(2005). A user’s guide to FACETS: Rasch-model computer programs [Software manual]. Chicago, IL: Winsteps.com

(2013). A user’s guide to FACETS. Program manual 3 71.0. Rasch-Model Computer Programs. Retrieved from: [URL]

Little, D. (2009). Language learner autonomy and the European language portfolio: Two L2 English examples. Language Teaching, 42(2), 222–233.

Liu, X. & Li, L. (2014). Assessment training effects on student assessment skills and task performance in a technology-facilitated peer assessment. Assessment and Evaluation in Higher Education, 39(3), 275–292.

Matsuno, S. (2009). Self-, peer-, and teacher-assessments in Japanese university EFL writing classrooms, Language Testing, 26(1), 75–100.

Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. In E. V. Smith & R. M. Smith, (Eds.), Introduction to Rasch measurement (pp. 518–574). Maple Grove, MI: JAM Press.

Nakamura, Y. (2002). Teacher Assessment and Peer Assessment in Practice (English Teaching). Educational studies, 441, 203–215.

Nguyen, L. T. C., & Gu, Y. (2013). Strategy-based instruction: A learner-focused approach to developing learner autonomy. Language Teaching Research, 17(1), 9–30.

Ozogul, G., & Sullivan, H. (2007). Student performance and attitudes under formative evaluation by teacher, self- and peer-evaluators. Education Technology Research and Development. 57(3), 393–410.

Saito, H. (2008). EFL classroom peer assessment: Training effects on rating and commenting. Language Testing, 251, 553–581.

Saito, H., & Fujita, T. (2004). Characteristics and user acceptance of peer rating in EFL writing classroom. Language Teaching Research, 311, 31–54.

Topping, K. J. (2003). Self and peer assessment in school and university: Reliability, validity and utility. In M. S. R. Segers, F. J. R. C. Dochy, & E. C. Cascallar (Eds.), Optimizing new modes of assessment: In search of qualities and standards (pp. 55–87). Dordrecht, Netherlands.

(2010). Methodological quandaries in studying process and outcomes in peer assessment. Learning and Instruction, 201, 339–343.

Van Gennip, N. A. E., Segers, M. S. R., & Tillema, H. H. (2009). Peer assessment for learning from a social perspective: the influence of interpersonal variables and structural features. Educational Research Review, 41, 41–54.

(2010). Peer assessment as a collaborative learning activity: the role of interpersonal factors and conceptions. Learning and Instruction, 20(4), 280–290.

Van Zundert, M., Sluijsmans, D. M. A., & Van Merrie¨nboer, J. J. G. (2010). Effective peer assessment processes: research findings and future directions. Learning and Instruction, 20(4), 270–279.

Weaver, D., & Esposto, A. (2012). Peer assessment as a method of improving student engagement. Assessment & Evaluation in Higher Education, 37(7), 805–816.

Weir, C. J. (2005). Language testing and validation. Hampshire: Palgrave McMillan.

Cited by (1)

Cited by one other publication

Paquot, Magali, Rachel Rubin & Nathan Vandeweerd

2022. Crowdsourced Adaptive Comparative Judgment: A Community‐Based Solution for Proficiency Rating. Language Learning 72:3 ► pp. 853 ff.

This list is based on CrossRef data as of 6 august 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.