Over the past decade, interpretation assessment has played an increasingly important role in interpreter education, professional certification, and interpreting research. The time-honored assessment method is based on analysis of (para)linguistic features of interpretation (including such items as omissions, substitutions, un/filled pauses and self-corrections). Recently, use of descriptor-based rating scales to assess interpretation has emerged as a viable alternative (e.g., Angelelli 2009; Han 2015, 2016; J. Lee 2008; Tiselius 2009), arguably providing a basis for reliable, valid and practical assessments. However, little work has been done in interpreting studies to ascertain the assumed benefits of this emerging assessment practice. Based on 17 international peer-reviewed journals over the last twelve years (2004–2015), and other related publications (e.g., scholarly books, reports, documents), this article provides an overview of practices in scale-based interpretation assessment, focusing on four major aspects: (a) rating scales; (b) raters; (c) rating procedures; (d) reporting of assessment outcomes. Problem areas and possible emerging trends in interpretation assessment are examined, identifying a number of future research needs.
(1993) A psychometric approach to the selection of translation and interpreting students in Taiwan. Perspectives 1 (1), 91–104.
Arocha, I. S. & Joyce, L.
(2013) Patient safety, professionalization, and reimbursement as primary drivers for national medical interpreter certification in the United States. Translation & Interpreting 5 (1), 127–142.
Bachman, L. F. & Palmer, A. S.
(1996) Language testing in practice. Oxford, UK: Oxford University Press.
Bachman, L. F.
(2004) Statistical analyses for language assessment. Cambridge, UK: Cambridge University Press.
Barik, H. C.
(1973) Simultaneous interpretation: Temporal and quantitative data. Language and Speech 16 (3), 237–270.
Barkaoui, K.
(2007) Rating scale impact on EFL essay marking: A mixed-method study. Assessing Writing 121, 86–107.
Barkaoui, K.
(2010) Variability in ESL essay rating processes: The role of rating scale and rater experience. Language Assessment Quarterly 7 (1), 54–74.
(2011) Striving for an ‘A’ grade: A case study in performance management of interpreters. International Journal of Interpreter Education 31, 56–71.
Brennan, R. L.
(2001) Generalizability theory. New York: Springer.
Bühler, H.
(1986) Linguistic (semantic) and extra-linguistic (pragmatic) criteria for the evaluation of conference interpretation and interpreters. Multilingua 5 (4), 231–235.
Campbell, S. & Hale, S.
(2003) Translation and interpreting assessment in the context of educational measurement. In G. Anderman & M. Rogers (Eds.), Translation today: Trends and perspectives. Clevedon: Multilingual Matters, 205–224.
Carroll, J. B.
(1966) An experiment in evaluating the quality of translations. Mechanical Translation and Computational Linguistics 9 (3–4), 55–66.
CCHI
(2011) Technical report on the development and pilot testing of the CCHI examinations. Washington, DC: Certification Commission for Healthcare Interpreters. [URL] (accessed 22 May 2015).
CCHI
(2012) Technical report on the development and pilot testing of the Certified Healthcare Interpreter™ (CHI™) examination for Arabic and Mandarin. Washington, DC: Certification Commission for Healthcare Interpreters. [URL] (accessed 22 May 2015).
Chen, J.
(2009) Authenticity in accreditation tests for interpreters in China. The Interpreter and Translator Trainer 3 (2), 257–273.
(2009) Exploring translation and interpreting hybrids: The case of sight translation. Meta 54 (3), 588–604.
East, M. & Young, D.
(2007) Scoring L2 writing samples: Exploring the relative effectiveness of two different diagnostic methods. New Zealand Studies in Applied Linguistics 13 (1), 1–21.
Engelhard, G.
(1994) Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement 31 (2), 93–112.
Engelhard, G.
(1996) Evaluating rater accuracy in performance assessments. Journal of Educational Measurement 33 (1), 56–70.
Feng, J. Z.
(2005) 论口译测试的规范化. [Towards the standardization of interpretation testing]. 外语研究, 891, 54–58.
(2013) Evaluating assessment practices at the MCI in Cyprus. In D. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 145–162.
Fulcher, G.
(1996) Does thick description lead to smart tests? A data-based approach to rating scale construction. Language Testing 13 (2), 208–238.
Fulcher, G., Davidson, F. & Kemp, J.
(2011) Effective rating scale development for speaking tests: Performance Decision Trees. Language Testing 28 (1), 5–29.
Garant, M.
(2009) A case for holistic assessment. AFinLA-e Soveltavan kielitieteen tutkimuksia 11, 5–17.
Gile, D.
(1999) Variability in the perception of fidelity in simultaneous interpretation. Hermes 221, 51–79.
Goldman-Eisler, F.
(1967) Segmentation of input in simultaneous interpretation. Journal of Psycholinguistic Research 11, 127–140.
Goulden, N. R.
(1992) Theory and vocabulary for communication assessments. Communication Education 41 (3), 258–269.
(2014) Monolingual short courses for language-specific accreditation: Can they work? A Sydney experience. The Interpreter and Translator Trainer 8 (2), 1–23.
Hale, S., Garcia, I., Hlavac, J., Kim, M., Lai, M., Turner, B. & Slatyer, H.
(2012) Development of a conceptual overview for a new model for NAATI standards, testing and assessment. Sydney, Australia. [URL] (accessed 22 May 2015).
Hamidi, M. & Pöchhacker, F.
(2007) Simultaneous consecutive interpreting: A new technique put to the test. Meta 52 (2), 276–289.
Hlavac, J.
(2013) A cross-national overview of translator and interpreter certification procedures. Translation & Interpreting 51, 32–65.
Han, C.
(2014) Measuring rater variability in interpreter performance testing: Using classical test theory, G theory and Rasch measurement. Paper presented at the Biennial Conference of the Association for Language Testing and Assessment of Australia and New Zealand at the University of Queensland, 27–29 November 2014.
(2016) Investigating score dependability in English/Chinese interpreter certification performance testing: A generalizability theory approach. Language Assessment Quarterly 13 (3), 186–201.
(2013) Developing analytic rating guides for TOEFL iBT® integrated speaking tasks. [URL] (accessed 12 June 2015).
Kelly, N.
(2007) Interpreter certification programs in the U.S.: Where are we headed?The ATA Chronicle 36 (1), 31–39.
Knoch, U.
(2009) Diagnostic writing assessment: The development and validation of a rating scale. Frankfurt: Peter Lang.
Knoch, U.
(2011) Rating scales for diagnostic assessment of writing: What should they look like and where should the criteria come from?Assessing Writing 16 (2), 81–96.
Ko, L.
(2008) Teaching interpreting by distance mode: An empirical study. Meta 53 (4), 814–840.
Lee, J.
(2008) Rating scales for interpreting performance assessment. The Interpreter and Translator Trainer 2 (2), 165–184.
(2005) Self-assessment as an autonomous learning tool in an interpretation classroom. Meta 50 (4).
Lim, H. -O.
(2006) A comparison of curricula of graduate schools of interpretation and translation in Korea. Meta 51 (2), 215–228.
Lin, I. I., Chang, F. A., & Kuo, F.
(2013) The impact of non-native accented English on rendition accuracy in simultaneous interpreting. Translation & Interpreting 5 (2), 30–44.
Liu, M.
(2013) Design and analysis of Taiwan’s interpretation certification examination. In D. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 163–178.
Liu, M., Chang, C. -C. & Wu, S. -C.
(2008) Interpretation evaluation practices: Comparison of eleven schools in Taiwan, China, Britain, and the USA. Compilation and Translation Review 1 (1), 1–42.
Llewellyn, J. P.
(1981) Simultaneous interpreting. In J. K. Woll & M. Deuchar (Eds.), Perspectives on British Sign Language & Deafness. London: Croom Helm, 89–104.
Lu, M., Liu, C. & Gong, X. F.
(2007) 全国翻译专业资格(水平)考试英语口译试题命制一致性研究报告. [How to maintain consistency in CATTI’s interpretation tests: A research report]. 中国翻译, 51, 57–61.
Lunz, M. E. & Stahl, J. A.
(1990) Judge consistency and severity across grading periods. Evaluation and the Health Professions 13 (4), 425–444.
Lynch, B. K. & McNamara, T. F.
(1998) Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing 15 (2), 158–180.
(2007) Assessing dual-role staff-interpreter linguistic competency in an integrated healthcare system. Journal of General Internal Medicine 22 (Suppl 2), 331–335.
Myford, C. M. & Wolfe, E. W.
(2003) Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement 4 (4), 386–422.
Napier, J.
(2004) Sign language interpreter training, testing, and accreditation: An international comparison. American Annals of the Deaf 149 (4), 350–359.
National Center for State Courts
(2013) Federal Court Interpreter Certification Examination for Spanish/English: Examinee handbook. [URL] (accessed 22 May 2015).
Office of China Accreditation Tests for Translators and Interpreters
(2005) 二级口译英语同声传译类考试大纲. 外文出版社. [Syllabus of CATTI Level-two Simultaneous Interpreting Test]. Beijing: Foreign Languages Press.
Pellatt, V., Griffiths, K. & Wu, S. -C.
(Eds.) (2010) Teaching and testing interpreting and translating. Bern: Peter Lang.
Peng, K. -C.
(2006) The development of coherence and quality of performance in conference interpreter training. PhD Dissertation, University of Leeds.
Petronio, K. & Hale, K.
(2009) One interpreter education program, two sites: A comparison of factors and outcomes. International Journal of Interpreter Education 11, 46–61.
Pio, S.
(2003) The relation between ST delivery rate and quality in simultaneous interpretation. The Interpreters’ Newsletter 121, 69–100.
Pöchhacker, F.
(2001) Quality assessment in conference and community interpreting. Meta 46 (2), 410–425.
Pöchhacker, F.
(2004) Introducing interpreting studies. Shanghai: Shanghai Foreign Language Education Press.
(2010) Development and validation of oral and written examinations for medical interpreter certification: Technical report. Burbank, CA. [URL] (accessed 22 May 2015).
PSI Services LLC
(2013) Development and validation of oral examinations for medical interpreter certification: Mandarin, Russian, Cantonese, Korean, and Vietnamese forms. [URL] (accessed 22 May 2015).
Rennert, S.
(2010) The impact of fluency on the subjective assessment of interpreting quality. The Interpreters’ Newsletter 151, 101–115.
Ribas, M. A.
(2010) Formative assessment in the interpreting classroom: Using the portfolio with students beginning simultaneous interpreting. Current Trends in Translation Teaching and Learning 31, 97–131.
Roat, C. E.
(2006) Certification of health care interpreters in the United States: A primer, a status report and considerations for national certification. Los Angeles, CA. [URL] (accessed 22 May 2015).
Roels, B.
(2013) Certification of social interpreters in Flanders, Belgium: Assessment and politics. In D. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 179–197.
(2011) A contrastive analysis of the main benchmarking tools for research assessment in translation and interpreting: The Spanish approach. Perspectives 19 (3), 233–251.
(2007) Construct validation of analytic rating scales in a speaking assessment: Reporting a score profile and a composite. Language Testing 24 (3), 355–390.
(1995) Assessment of simultaneous interpreting. In C. Dollerup & V. Appel (Eds.), Teaching translation and interpreting 3: New horizons. Amsterdam: John Benjamins, 187–195.
(2013) Assessing interpreter aptitude in a variety of languages. In D. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 35–50.
Skyba, K.
(2014) Translators and interpreters certification in Australia, Canada, the USA and Ukraine: Comparative analysis. Comparative Professional Pedagogy 4 (3), 58–64.
Stenzl, C.
(1983) Simultaneous interpretation: Groundwork towards a comprehensive model. MA thesis, University of London.
Strong, M. & Rudser, S. F.
(1985) An assessment instrument for sign language interpreters. Sign Language Studies 491, 343–362.
Timarová, Š. & Ungoed-Thomas, H.
(2008) Admission testing for interpreting courses. The Interpreter and Translator Trainer 2 (1), 29–46.
Timarová, Š., Čeňková, I., Meylaerts, R., Hertog, E., Szmalec, A. & Duyck, W.
(2009) Revisiting Carroll’s scales. In C. V. Angelelli & H. E. Jacobson (Eds.), Testing and assessment in translation and interpreting studies. Amsterdam: John Benjamins, 95–121.
Tsagari, D. & van Deemter, R.
(Eds.) (2013) Assessment issues in language translation and interpreting. Frankfurt: Peter Lang.
Turner, B., Lai, M. & Huang, N.
(2010) Error deduction and descriptors – a comparison of two methods of translation test assessment. Translation & Interpreting 2 (1), 11–23.
Upshur, J. & Turner, C. E.
(1995) Constructing rating scales for second language tests. ELT Journal 49 (1), 3–12.
(2013) Rethinking bifurcated testing models in the court interpreter certification process. In D. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 67–84.
Wang, B. H.
(2007) 口译能力评估和译员能力评估 – 口译的客观评估模式初探. [From interpreting competence to interpreter competence – a tentative model for objective assessment of interpreting]. 外语界 31, 44–50.
Wang, B. H.
(2011) 口译能力的评估模式及测试设计再探 – 以全国英语口译大赛为例. [Exploration of the assessment model and test design of interpreting competence]. 外语界 11, 66–71.
Wang, J. -H., Napier, J., Goswell, D. & Carmichael, A.
(2015) The design and application of rubrics to assess signed language interpreting performance, The Interpreter and Translator Trainer 9 (1), 83–103.
Wang, M. W. & Stanley, J. C.
(1970) Differential weighting: A review of methods and empirical studies. Review of Educational Research 41, 663–705.
Wigglesworth, G.
(1993) Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing 10 (3), 305–319.
Wu, J., Liu, M. & Liao, C.
(2013) Analytic scoring in interpretation test: Construct validity and the halo effect. In H. -H. Liao, T. -E. Kao & Y. Lin (Eds.), The making of a translator: Multiple perspectives. Taipei: Bookman, 277–292.
Wu, S. C.
(2010) Assessing simultaneous interpreting: A study on test reliability and examiners’ assessment behavior. PhD thesis, Newcastle University.
Wu, S. C.
(2013) How do we assess students in the interpreting examinations? In D. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 15–33.
Xi, X. -M. & Mollaun, P.
(2006) Investigating the utility of analytic scoring for the TOEFL Academic Speaking Test (TAST). [URL] (accessed 15 June 2015).
Yan, J. X., Pan, J. & Wang, H. -H.
(2010) Learner factors, self-perceived language ability and interpreting learning: An Investigation of Hong Kong tertiary interpreting classes. The Interpreter and Translator Trainer 4 (2), 173–196.
Yan, J. X., Pan, J., Wu, H. & Wang, Y.
(2013) Mapping interpreting studies: The state of the field based on a database of nine major translation and interpreting journals (2000–2010). Perspectives 21 (3), 446–73.
Yeh, S. -P., & Liu, M.
(2006) 口譯評分客觀化初探:採用量表的可能性 [A more objective approach to interpretation evaluation: Exploring the use of scoring rubrics]. 國立編譯館館刊 34 (4), 57–78.
Youdelman, M.
(2013) The development of certification for healthcare interpreters in the United States. Translation & Interpreting 5 (1), 114–126.
Zanotti, M.
(2011) Authentic and valid assessment: Assessing the performance of public service interpreters. Investigation in University Teaching and Learning 71, 99–105.
Zhao, N. & Dong, Y. P.
(2013) 基于多面Rasch模型的交替传译测试效度验证. [Validation of a consecutive interpreting test based on multi-faceted Rasch model]. 解放军外国语学院学报 36 (1), 86–90.
2018. A longitudinal quantitative investigation into the concurrent validity of self and peer assessment applied to English-Chinese bi-directional interpretation in an undergraduate interpreting course. Studies in Educational Evaluation 58 ► pp. 187 ff.
Han, Chao
2019. A generalizability theory study of optimal measurement design for a summative assessment of English/Chinese consecutive interpreting. Language Testing 36:3 ► pp. 419 ff.
Han, Chao
2021. Analytic rubric scoring versus comparative judgment: a comparison of two approaches to assessing spoken-language interpreting. Meta 66:2 ► pp. 337 ff.
2023. Can automated machine translation evaluation metrics be used to assess students’ interpretation in the language learning classroom?. Computer Assisted Language Learning 36:5-6 ► pp. 1064 ff.
2023. Aptitude for interpreting: the predictive value of cognitive fluency. The Interpreter and Translator Trainer 17:1 ► pp. 155 ff.
Wang, Weiwei, Yi Xu, Binhua Wang & Lei Mu
2020. Developing Interpreting Competence Scales in China. Frontiers in Psychology 11
이지은, Choi, Hyo-eun & You-jin Lee
2019. 평가 척도를 이용한 사법통역 평가 사례연구. The Journal of Translation Studies 20:2 ► pp. 81 ff.
This list is based on CrossRef data as of 26 november 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.