Een Poging tot Beperking van Subjectiviteit bij de Beoordeling van Schrijfvaardigheid mvt
Test results should be determined as little as possible by chance factors. Such factors harm the reliability of results. In testing writing proficiency open-ended tasks may involve a variety of chance factors affecting assessment a.o. It is well-known that when a number of teachers rate the same written work, the outcomes tend to differ considerably from each other. The subjectivity showing up here also affects the assessment of modern foreign language writing proficiency in school-based exams. The use of different measures may be hard to rule out, but is unacceptable as a matter of principle. For this reason CITO has been making attempts to improve the agreement among raters of foreign language writing proficiency. Research has been done to find out if information offered to teachers concerning the severity, or otherwise, of their assessments enhanced inter-rater agreement. The subject of this investigation was based on the following argument: teachers are often unaware of how their assessments compare to those of colleagues. If there were a means enabling them to determine (independently and anonymously) whether they fall into the category of mild or severe assessors, this information might cause them to change their 'deviant' behaviour and aim at the mean. If many teachers were to calibrate their evaluation in this way, the objectivity of assessment might be improved.
In order to find out if such an effect occurs at all, a test was carried out in which 26 teachers rated 25 letters written by pupils (in German, French or English). They were asked to follow precise instructions. The results were reported back to them. After over a year had passed, the teachers were asked to rate the same letters once again following the same instructions as before. They were reminded of the results of the previous round (by such comments as: 'last time your assessments were pretty severe'). The inter-rater agreement turned out to be neither higher nor lower than the first time. The information about relative mildness or otherwise apparently had not had the desired effect. One explanation could be that, contrary to what had been hoped, teachers are not really able to adjust their rating. The fact that the results of raters in the 'average' group (who, of course, were not supposed to adjust their rating) showed quite some variation over the two rounds (the mean intra-rater agreement was .79) would seem to support this explanation.
Article language: Dutch