Chapter 4. Statistical methodology for developing vertical scales for language tests

Gu, Lixiong; Li, Yanmei; Monfils, Lora; Papageorgiou, Spiros

doi:10.1075/illa.1.04gu

Part of

Meaningful Language Test Scores: Research to enhance score interpretation
Edited by Spiros Papageorgiou and Venessa F. Manna
[Innovations in Language Learning and Assessment 1] 2023
► pp. 61–77

Chapter 4
Statistical methodology for developing vertical scales for language tests

Lixiong Gu | Educational Testing Service

Yanmei Li | Educational Testing Service

Lora Monfils | Educational Testing Service

Spiros Papageorgiou | Educational Testing Service

This chapter is the last of the three chapters that describe the process of design and implementation of a vertical linking project for language tests based on a multi-year project for the TOEFL® Family of Assessments. The chapter focuses on the statistical methodology for developing the listening and reading vertical scales. Topics covered include the common item linking design, procedures of Item Response Theory (IRT) scaling and linking to build the vertical scales, and the method for evaluating measurement precision across language tests along the proficiency levels on the vertical scales.

Article outline

Introduction
Overview of the psychometric process for developing the TOEFL vertical scales for reading and listening
Data collection through operational test forms
Selecting the base test
Calibration and scaling of vertical linking items
Estimating linking constants across tests and removal of outlier vertical linking items
Placing score scales of all tests onto the vertical scale
Evaluating measurement precision across language proficiency levels
Conclusion
Notes
References

Published online: 29 June 2023

https://doi.org/10.1075/illa.1.04gu

References (7)

References

Council of Europe. (2001). The Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press.

ETS. (2020). TOEFL® program history. Retrieved on 7 February 2023 from [URL]

Kolen, M. J. (2006). Scaling and norming. In R. L. Brennan (Ed.), Educational Measurement (4th ed., pp. 156–186. American Council on Education; Praeger.

Monfils, L, & Manna, V. F. (this volume). Considerations in developing vertical scales for language tests. In S. Papageorgiou & V. F. Manna (Eds.), Meaningful language test scores: Research to enhance score interpretation. John Benjamins.

Papageorgiou, S., Ginsburgh, M., & Garcia Gomez, P. (this volume). Assessment design issues in developing vertical scales for language tests. In S. Papageorgiou & V. F. Manna (Eds.), Meaningful language test scores: Research to enhance score interpretation. John Benjamins.

Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.

Yen, W. M. & Fitzpatrick, A. (2006). Item response theory. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 111–153). American Council on Education, Praeger.

Chapter 4Statistical methodology for developing vertical scales for language tests

Chapter 4
Statistical methodology for developing vertical scales for language tests