Chapter 1. Overview of research to enhance score interpretation

Papageorgiou, Spiros; Manna, Venessa F.

doi:10.1075/illa.1.01pap

Part of

Meaningful Language Test Scores: Research to enhance score interpretation
Edited by Spiros Papageorgiou and Venessa F. Manna
[Innovations in Language Learning and Assessment 1] 2023
► pp. 1–11

Chapter 1
Overview of research to enhance score interpretation

Spiros Papageorgiou | Educational Testing Service

Venessa F. Manna | Educational Testing Service

The primary motivation for administering a language test is to use its scores to facilitate decisions of various kinds about language proficiency. Score-based decisions can be extremely consequential, both for test takers but also for test score users, such as universities and employers, and society overall. This book presents research efforts to enhance the interpretation of English language test scores provided by Educational Testing Service (ETS), and ultimately their usefulness for decision making. The different approaches for the enhancement of score interpretation are discussed. The research reported in each chapter is then briefly presented in relation to the various score enhancement approaches. The chapter concludes with a summary of challenges related to (mis)interpretation of language test scores.

Article outline

Test scores, decisions, and consequences
Approaches to enhancing score interpretation
Organization of the book
Conclusion
Notes
References

Published online: 29 June 2023

https://doi.org/10.1075/illa.1.01pap

References (33)

References

American Council on the Teaching of Foreign Languages. (2012). ACTFL proficiency guidelines. Retrieved on 6 February 2023 from [URL]

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association. Retrieved on 6 February 2023 from [URL]

Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508–600). American Council on Education.

Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford University Press.

Beaton, A., & Allen, N. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17(2), 191–204.

Centre for Canadian Language Benchmarks. (2012). Canadian language benchmarks: English as a second language for adults. Retrieved on 6 February 2023 from [URL]

Chapelle, C. A. (2008). The TOEFL® validity argument. In C. Chapelle, M. Enright, & J. Jamieson (Eds.), Building a validity argument for the Test of English as a Foreign Language (pp. 319–352). Routledge.

(2020). Argument-based validation in testing and assessment. Sage.

Cho, Y., Ginsburgh, M., Morgan, R., Moulder, B., Xi, X., & Hauck, M. C. (2016). Designing the TOEFL® Primary™ tests (Research Memorandum No. RM–16–02). ETS. Retrieved on 6 February 2023 from [URL]

Council of Europe. (2001). The Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press.

ETS. (2010). Comparing TOEFL® and IELTS™ total scores. Retrieved on 10 March 2023 [URL]

. (2020). TOEFL® research insight series: Vol. 1. TOEFL iBT® test framework and test development. Retrieved on 6 February 2023 from [URL]

Fulcher, G. (2016). Standards and frameworks. In D. Tsagari & J. Banerjee (Eds.), Handbook of second language assessment (pp. 29–44). De Gruyter Mouton.

Garcia Gomez, P., Noah, A, Schedl, M., Wright, C., & Yolkut, A. (2007). Proficiency descriptors based on a scale-anchoring study of the new TOEFL iBT reading test. Language Testing, 24(3), 417–435.

Harris, D. J. (2007). Practical issues in vertical scaling. In N. J. Dorans, M. Pommerich & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 233–251). Springer.

Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.

Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527–535.

Kolen, M. J. (2006). Scaling and norming. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 156–186). Praeger.

Liao, C.-W. (2010). TOEIC® Listening and Reading Test scale anchoring study (ETS Rep. TC–10–05). ETS. Retrieved on 6 February 2023 from [URL]

Lord, F. M. (1980). Applications of Item Response Theory to practical testing problems. Lawrence Erlbaum Associates.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan.

(1996). Validity and washback in language testing. Language Testing, 13(3), 241–256.

Milanovic, M., & Weir, C. J. (2010). Series editors’ note. In W. Martyniuk (Ed.), Relating language examinations to the Common European Framework of Reference for Languages: Case studies and reflections on the use of the Council of Europe’s Draft Manual (pp. viii–xx). Cambridge University Press.

National Education Examinations Authority. (2018). China’s standards of English language ability. Retrieved on 6 February 2023 from [URL]

Papageorgiou, S., & Cho, Y. (2014). An investigation of the use of TOEFL Junior Standard scores for ESL placement decisions in secondary education. Language Testing, 31(2), 223–239.

Papageorgiou, S., Davis, L., Norris, J. M., Garcia Gomez, P., Manna, V. F., & Monfils, L. (2021). Design framework for the TOEFL® Essentials™ test 2021 (Research Memorandum No. RM–21–03). ETS. Retrieved on 6 February 2023 [URL]

Papageorgiou, S., & Manna, V. F. (2021). Maintaining access to a large-scale test of academic language proficiency during the pandemic: The launch of TOEFL iBT Home Edition. Language Assessment Quarterly, 18(1), 36–41.

Papageorgiou, S., & Tannenbaum, R. J. (2016). Situating standard setting within argument-based validity. Language Assessment Quarterly, 13(2), 109–123.

Papageorgiou, S., Wu, S., Hsieh, C.-N., Tannenbaum, R. J., & Cheng, M. M. (2019). Mapping the TOEFL iBT® test scores to China’s standards of English language ability: Implications for score interpretation and use (Research Report No. TOEFL-RR–89). ETS.

Powers, D., Schedl, M., & Papageorgiou, S. (2017). Facilitating the interpretation of English language proficiency scores: Combining scale anchoring and test score mapping methodologies. Language Testing, 34(2), 175–195.

Ryan, J. (2006). Practices, issues, and trends in student test score reporting. In S. Downing & T. Haladyna (Eds.), Handbook of test development (pp. 677–710). Lawrence Erlbaum Associates.

So, Y., Wolf, M. K., Hauck, M. C., Mollaun, P., Rybinski, P., Tumposky, D., & Wang, L. (2015). TOEFL Junior® design framework (Research Report No. RR–15–13). ETS.

Zwick, R., Senturk, D., Wang, J., & Loomis, S. C. (2001). An investigation of alternative methods for item mapping in the National Assessment of Educational Progress. Educational Measurement: Issues and Practice, 20(2), 15–25.

Chapter 1Overview of research to enhance score interpretation

Chapter 1
Overview of research to enhance score interpretation