L'evaluation de L'expression Orale
Problemes et Solutions
This paper discusses some findings of a concurrent validation research project, reported on in the author's thesis (1977). In this project, two samples of 100 secondary school students participated in an half hour oral interview test to arrive at a criterion measure and they took three tests of 15 minutes each that were validated against the criterion established. One of these three tests was a traditional test (test 1); the other two were very structured, experimental tests, conceived to test oral proficiency in a most reliable way (tests 2a and 2b). The two samples of students represented two different levels of secondary education; each of them passed a complete series of tests; these two series were identical in form, but their content differed. Correlations between the criterion and test 1 were significantly lower than correlations between the criterion and tests 2a and 2b; the latter exceeded .80 and suggest the validity of these tests. The difference between test 1 and tests 2a, 2b in this regard can be explained by the considerable differences these tests showed with respect to their rater reliability: scored by only one rater, the rater reliability of tests 2a and 2b exceeded .90 in most cases. The very good rater reliability of tests 2a and 2b is essentially due to the following characteristics of these tests and the way they are rated:
-instead of giving only one mark at the end of the test, the rater scores separately any of a great number of answers;
-the use of pictures in 2a and the use of L-l key-words in 2b delimit entirely the content of the answers the students have to give, so the rater can focus his attention on the rating of the wording of the answers without being troubled about any diversity of content.
The answers of the interview test, i.e. the criterion, were also rated sepa-rately; this made the half hour interview a reliable test (rated by one rater: .90), but only half as efficient as tests 2a and 2b, the 15 minute tests. The contribution of the use of the analytic method of rating to rater reliability was also investigated, but showed to be rather irrelevant. The interview and tests 2a and 2b represent three types of tests that seem to have their specific qualities and limitations. The interview allows the stu-dent to word his own ideas, but it is less reliable and efficient; tests 2a and 2b are very reliable and efficient(they, too, can serve for the construc-tion of achievement tests), but the use of pictures confines test 2a to the testing of the use of concrete language and the use of L-l key-words in test 2b may tempt the student to simply translate.
Article language: Dutch