Assessing spoken-language interpreting
The method of comparative judgement
In this study, we applied and evaluated a scoring method known as comparative judgement to assess spoken-language interpreting. This methodological exploration represents an extension of previous efforts to optimise scoring methods for assessing interpreting. Essentially, comparative judgement requires judges to compare two similar objects and make a binary decision about their relative qualities. To evaluate its reliability, validity and usefulness in the assessment of interpreting, we recruited two groups of judges (novice and experienced) to assess 66 two-way English/Chinese interpretations based on a computerised comparative judgement system. Our data analysis shows that the new method produced reliable and valid results across judge types and interpreting directions. However, the judges held polarised opinions about the method’s usefulness: while some considered it convenient, efficient and reliable, the opposite view was expressed by others. We discuss the results by providing an integrated analysis of the data collected, outline the perceived drawbacks and propose possible solutions to the drawbacks. We call for more evidence-based, substantive investigation into comparative judgement as a potentially useful method for assessing spoken-language interpreting in certain settings.
Article outline
- 1.Introduction
- 2.Comparative judgement
- 2.1An introduction to comparative judgment
- 2.2A comparative judgement approach to assessing spoken-language interpreting
- 2.3Potential research gaps
- 3.Research questions
- 4.Method
- 4.1Interpreting recordings
- 4.2Participants
- 4.3Online comparative judgment platform
- 4.4Judge preparation and training
- 4.5Procedures for comparative judgement
- 4.6Post-hoc interview
- 4.7Data analysis
- 5.Results
- 5.1Reliability evidence
- 5.2Validity evidence
- 5.3Judges’ perceived usefulness
- 6.Discussion
- 6.1Reliability
- 6.2Validity
- 6.3Perceived usefulness
- 6.4Further analysis of potential drawbacks of comparative judgement
- 7.Conclusion
- Notes
-
References
References (45)
Andrich, D.
(
1978)
Relationships between the Thurstone and Rasch approaches to item scaling.
Applied Psychological Measurement 2 (3), 451–462.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Barik, H. C.
(
1971)
A description of various types of omissions, additions and errors of translation encountered in simultaneous interpretation.
Meta 16 (4), 199–210.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bradley, R. A. & Terry, M. E.
(
1952)
Rank analysis of incomplete block designs: The method of paired comparisons.
Biometrika 39 (3/4), 324–345.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bramley, T.
(
2015)
Investigating the reliability of adaptive comparative judgement.
[URL] (Accessed 9 June 2021).
Bramley, T., Bell, J. & Pollitt, A.
(
1998)
Assessing changes in standards over time using Thurstone paired comparisons.
Education Research and Perspectives 25 (2), 1–23.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bühler, H.
(
1986)
Linguistic (semantic) and extralinguistic (pragmatic) criteria for the evaluation of conference interpretation and interpreters.
Multilingua 51, 231–235.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
CCHI
(
2012)
Technical report on the development and pilot testing of the Certified Healthcare Interpreter™ (CHI™) examination for Arabic and Mandarin.
[URL] (Accessed 9 June 2021).
Chen, J., Yang, H-B. & Han, C.
(
2021)
Holistic versus analytic scoring of spoken-language interpreting: A multi-perspectival comparative analysis. Manuscript submitted for publication.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Han, C.
(
2017)
Using analytic rating scales to assess English–Chinese bi-directional interpreting: A longitudinal Rasch analysis of scale utility and rater behaviour.
Linguistica Antverpiensia, New Series: Themes in Translation Studies 161, 196–215.
[URL]
Han, C.
(
2019)
A generalizability theory study of optimal measurement design for a summative assessment of English/Chinese consecutive interpreting.
Language Testing 36 (3), 419–438.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Han, C.
(
2021)
Interpreting testing and assessment: A state-of-the-art review.
Language Testing.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hartley, A., Mason, I., Peng, G. & Perez, I.
(
2003)
Peer- and self-assessment in conference interpreter training. Centre for Languages, Linguistics and Area Studies.
[URL]
International School of Linguists
(
2017)
Diploma in Public Service Interpreting: Learner handbook. London, UK.
[URL] (Accessed 9 June 2021).
Jones, I. & Inglis, M.
(
2015)
The problem of assessing problem solving: Can comparative judgement help? Educational Studies in Mathematics 89 (3), 337–355.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Jones, I. & Wheadon, C.
(
2015)
Peer assessment using comparative and absolute judgement.
Studies in Educational Evaluation 471, 93–101.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Jones, I., Swan, M. & Pollitt, A.
(
2015)
Assessing mathematical problem solving using comparative judgement.
International Journal of Science and Mathematics Education 131, 151–177.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Laming, D.
(
2004)
Marking university examinations: Some lessons from psychophysics.
Psychology Learning and Teaching 3 (2), 89–96.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lee, J.
(
2008)
Rating scales for interpreting performance assessment.
The Interpreter and Translator Trainer 2 (2), 165–184.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Linacre, J. M.
(
2002)
What do infit and outfit, mean-square and standardized mean? Rasch Measurement Transactions 16 (2), 878.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Liu, M.
(
2013)
Design and analysis of Taiwan’s interpretation certification examination. In
D. Tsagari &
R. van Deemter (Eds.),
Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 163–178.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Luce, R.
(
1959)
Individual choice behavior. New York: Wiley.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
McMahon, S. & Jones, I.
(
2015)
A comparative judgement approach to teacher assessment.
Assessment in Education: Principles, Policy & Practice 22 (3), 368–389.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Myford, C. M. & Wolfe, E. W.
(
2003)
Detecting and measuring rater effects using many-facet Rasch measurement: Part I.
Journal of Applied Measurement 4 (4), 386–422.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
National Center for State Courts
(
2019)
Federal Court Interpreter Certification Examination for Spanish/English: Examinee handbook.
[URL] (Accessed 9 June 2021).
Pollitt, A.
(
2012a)
Comparative judgement for assessment.
International Journal of Technology and Design Education 22 (2), 157–170.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Pollitt, A.
(
2012b)
The method of adaptive comparative judgement.
Assessment in Education: Principles, Policies & Practice 19 (3), 281–300.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Pollitt, A. & Murray, N. L.
(
1996)
What raters really pay attention to? In
M. Milanovic, &
N. Saville (Eds.),
Studies in language testing 3: Performance testing, cognition and assessment. Cambridge: Cambridge University Press, 74–91.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
PSI Services LLC
(
2013)
Development and validation of oral examinations for medical interpreter certification: Mandarin, Russian, Cantonese, Korean, and Vietnamese forms.
[URL] (Accessed 9 June 2021).
Salkind, N. J.
(
2007)
Encyclopedia of measurement and statistics. Thousand Oaks, CA: Sage.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Thurstone, L. L.
(
1927)
A law of comparative judgment.
Psychological Review 34 (4), 273–286.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Thurstone, L. L.
(
1954)
The measurement of values.
Psychological Review 61 (1), 47–58.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tiselius, E.
(
2009)
Revisiting Carroll’s scales. In
C. V. Angelelli &
H. E. Jacobson (Eds.),
Testing and assessment in translation and interpreting studies. Amsterdam: John Benjamins, 95–121.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Verhavert, S., Bouwer, R., Donche, V. & De Maeyer, S.
(
2019)
A meta-analysis on the reliability of comparative judgement.
Assessment in Education: Principles, Policy & Practice 26 (5), 541–562.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Wu, S.
(
2010)
Assessing simultaneous interpreting: A study on test reliability and examiners’ assessment behavior. PhD thesis, Newcastle University.
Wu, S.
(
2013)
How do we assess students in the interpreting examinations? In
D. Tsagari &
R. van Deemter (Eds.),
Assessment issues in language translation and interpreting. Frankfurt am Main: Peter Lang, 15–33.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cited by (2)
Cited by 2 other publications
Thwaites, Peter, Charalambos Kollias & Magali Paquot
2024.
Is CJ a valid, reliable form of L2 writing assessment when texts are long, homogeneous in proficiency, and feature heterogeneous prompts?.
Assessing Writing 60
► pp. 100843 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.