Schrijfvaardigheidsmetingen
Betrouwbaarheidswinst en Instrumentbias
Rob Schoonen | Stichting Centrum voor Onderwijsonderzoek UvA
In the assessment of writing ability it is difficult and costly to rate the students' texts reliably. Using more structured writing tasks and rating procedures could help avoiding some of the assessment problems. However, structured tasks and rating procedures threaten the validity of the writing assessment. This balance between (rating) reliability on the one hand and validity on the other hand is the main focus of this paper.
In the present study five different types of writing task were evaluated. The writing tasks represented different degrees of structuring. Two tasks could be regarded as direct measurements, two as semi-direct measurements and one as an indirect measurement.
A way to improve rating reliability is to structure the rating procedure as opposed to structuring the writing task. Two scoring methods were compared with respect to the ratings they produced for two tasks: the specified and the stuctured task. The methods consisted of (1) essay scales with example essays as 'range finders', and (2) scoring according to strict scoring guides (counts). The structuring of both the writing task and the rating procedure were expected to simplify the assessment of writing ability and to improve the reliability of the measurements.
The effects of writing task and rating procedure on reliability of individual raters (five raters with the direct measurements, three with the semi-direct measurements) were analysed with ANOVA's. The results did not completely support the theoretical assumptions. However, as expected, the semi-direct tasks were rated far more reliably than the direct tasks. The (small) differences between the two direct respectively two semi-direct tasks showed a positive 'structuring effect', i.e. gain in rating reliability. Furthermore, the rating 'with scoring guides' was shown to be more reliable than the rating with essay scales. This difference was related to the aspect of the texts being rated: rating reliability appeared to vary between the rating procedures when content and organization were being rated, whereas no such difference was found in rating language usage.
Possible instrument bias was studied as a way of evaluating the validity of the writing assignments. No bias with respect to sex could be demonstrated, but there was a bias with respect to writing ability. Proficient writers seemed to be unable to produce a distinctive performance in semi-direct and indirect tasks, whereas less proficient writers seemed especially to benefit from the 'simplicity' of these more restrictive tasks. These findings appeared most strongly in relation to the content and organization scores.
In line with other evaluations of the validity of writing assignments (cf. Schoonen, 1991), one must conclude that the gain in reliability as a result of the structuring of the writing assignment also means a loss in validity, especially with respect to content and organizational aspects of texts. The rating of the essays is best done using essay scales rather than strict scoring guides (counts), especially when it concerns language use.
Article language: Dutch