The study reported on in this article pertains to rater-mediated assessment of English-to-Chinese consecutive interpreting, particularly informational correspondence between an originally intended message and an actually rendered message, also known as “fidelity” in Interpreting Studies. Previous literature has documented two main methods to assess fidelity: comparing actual renditions with the source text or with an exemplar rendition carefully prepared by experts (i.e., an ideal target text). However, little is known about the potential effects of these methods on fidelity assessment. We therefore conducted the study to explore the way in which these methods would affect rater reliability, fidelity ratings and rater perception. Our analysis of quantitative data shows that the raters tended to be less reliable, less self-consistent, less lenient and less comfortable when using the source English text (i.e., Condition A) than when using the target Chinese text (i.e., Condition B: the exemplar rendition). These findings were backed up and explained by emerging themes derived from the qualitative questionnaire data. The fidelity estimates in the two conditions were also found to be strongly correlated. We discuss these findings and entertain the possibility of recruiting untrained monolinguals or bilinguals to assess fidelity of interpreting.
Barik, H. C. (1975). Simultaneous interpretation: Temporal and quantitative data. Language and Speech 16 (3), 237–270.
Bühler, H. (1986). Linguistic (semantic) and extra-linguistic (pragmatic) criteria for the evaluation of conference interpretation and interpreters. Mutlilingua 5 (4), 231–235.
Campbell, S. & Hale, S. (2003). Translation and interpreting assessment in the context of educational measurement. In G. Anderman & M. Rogers (Eds.), Translation today: Trends and perspectives. Clevedon: Multilingual Matters, 205–224.
Carroll, J. B. (1966). An experiment in evaluating the quality of translations. Mechanical Translation and Computational Linguistics 9 (3–4), 55–66.
Coughlin, D. (2003). Correlating automated and human assessments of machine translation quality. Retrieved from <[URL]>
Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Frankfurt am Main: Peter Lang.
Gerver, D. (1969/2002). The effects of source language presentation rate on the performance of simultaneous conference interpreters. In F. Pöchhacker & M. Shlesinger (Eds.), The interpreting studies reader. London: Routledge, 53–66.
Han, C. (2016). Investigating score dependability in English/Chinese interpreter certification performance testing: A generalizability theory approach. Language Assessment Quarterly 13 (3), 186–201.
Han, C. (2017). Using analytic rating scales to assess English/Chinese bidirectional interpretation: A longitudinal Rasch analysis of scale utility and rater behavior. Linguistica Antverpiensia New Series – Themes in Translation Studies 161, 196–215.
Han, C. (2018b). Latent trait modelling of rater accuracy in formative peer assessment of English–Chinese consecutive interpreting. Assessment & Evaluation in Higher Education 43 (6), 979–994.
Han, C. (2019). A generalizability theory study of optimal measurement design for a summative assessment of English/Chinese consecutive interpreting. Language Testing 36(3), 419–438.
Hlavac, J. (2013). A cross-national overview of translator and interpreter certification procedures. Translation & Interpreting 51, 32–65.
Lee, J. (2008). Rating scales for interpreting performance assessment. The Interpreter and Translator Trainer 2 (2), 165–184.
Liu, M-H. (2013). Design and analysis of Taiwan’s interpretation certification examination. In D. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 163–178.
Liu, M-H., Chang, C-C. & Wu, S-C. (2008). Interpretation evaluation practices: Comparison of eleven schools in Taiwan, China, Britain, and the USA. Compilation and Translation Review 1 (1), 1–42.
Myford, C. M. & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement 4 (4), 386–422.
Pöchhacker, F. (2004). Introducing interpreting studies. London: Routledge.
Skaaden, H. (2013). Assessing interpreter aptitude in a variety of languages. In D. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 35–50.
Stemler, S. E. & Tsai, J. (2008). Best practices in estimating interrater reliability: Three common approaches. In J. Osborne (Ed.), Best practices in quantitative methods. Thousand Oaks, CA: Sage, 29–49.
Tiselius, E. (2009). Revisiting Carroll’s scales. In C. V. Angelelli & H. E. Jacobson (Eds.), Testing and assessment in translation and interpreting studies. Amsterdam: John Benjamins, 95–121.
Tommola, J. & Helevä, M. (1998). Language direction and source text complexity: Effects on trainee performance in simultaneous interpreting. In L. Bowker, M. Cronin, D. Kenny & J. Pearson (Eds.), Unity in diversity? Current trends in translation studies. Manchester: St Jerome, 177–186.
Wang, W-W., Xu, Y., Wang, B-H. & Mu, L. (2020). Developing interpreting competence scales in China. Frontiers in Psychology 111, 481.
Wu, J., Liu, M. & Liao, C. (2013). Analytic scoring in interpretation test: Construct validity and the halo effect. In H-H. Liao, T-E. Kao & Y. Lin (Eds.), The making of a translator: Multiple perspectives. Taipei: Bookman, 277–292.
Wu, S. C. (2010). Assessing simultaneous interpreting: A study on test reliability and examiners’ assessment behavior. PhD thesis, Newcastle University.
Yeh, S.-P. & Liu, M. (2006). A more objective approach to interpretation evaluation: Exploring the use of scoring rubrics. Compilation and Translation Review 34 (4), 57–78.
2024. Enhancing Assessment Systems in Higher Education. In Utilizing AI for Assessment, Grading, and Feedback in Higher Education [Advances in Educational Technologies and Instructional Design, ], ► pp. 28 ff.
Cai, Rendong, Jiexuan Lin & Yanping Dong
2023. Psychological factors and interpreting competence in interpreting students: a developmental study. The Interpreter and Translator Trainer 17:2 ► pp. 246 ff.
2023. Can automated machine translation evaluation metrics be used to assess students’ interpretation in the language learning classroom?. Computer Assisted Language Learning 36:5-6 ► pp. 1064 ff.
2023. A validation study of a consecutive interpreting test using many-facet Rasch analysis. Frontiers in Communication 7
Chen, Jing, Huabo Yang & Chao Han
2022. Holistic versus analytic scoring of spoken-language interpreting: a multi-perspectival comparative analysis. The Interpreter and Translator Trainer 16:4 ► pp. 558 ff.
Han, Chao
2022. Interpreting testing and assessment: A state-of-the-art review. Language Testing 39:1 ► pp. 30 ff.
This list is based on CrossRef data as of 12 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.