Chapter 4
Statistical methodology for developing vertical scales for
language tests
This chapter is the last of the three chapters that describe
the process of design and implementation of a vertical linking project
for language tests based on a multi-year project for the
TOEFL® Family of Assessments. The chapter focuses on the
statistical methodology for developing the listening and reading
vertical scales. Topics covered include the common item linking design,
procedures of Item Response Theory (IRT) scaling and linking to build
the vertical scales, and the method for evaluating measurement precision
across language tests along the proficiency levels on the vertical
scales.
Article outline
- Introduction
- Overview of the psychometric process for developing the TOEFL vertical
scales for reading and listening
- Data collection through operational test forms
- Selecting the base test
- Calibration and scaling of vertical linking items
- Estimating linking constants across tests and removal of outlier
vertical linking items
- Placing score scales of all tests onto the vertical scale
- Evaluating measurement precision across language proficiency
levels
- Conclusion
-
Notes
-
References
References (7)
References
Council of
Europe. (2001). The
Common European Framework of Reference for Languages:
Learning, teaching,
assessment. Cambridge University Press.
ETS. (2020). TOEFL®
program history. Retrieved on 7
February 2023 from [URL]
Kolen, M. J. (2006). Scaling
and
norming. In R. L. Brennan (Ed.), Educational
Measurement (4th
ed., pp. 156–186. American Council on Education; Praeger.
Monfils, L, & Manna, V. F. (this
volume). Considerations in
developing vertical scales for language
tests. In S. Papageorgiou & V. F. Manna (Eds.), Meaningful
language test scores: Research to enhance score
interpretation. John Benjamins.
Papageorgiou, S., Ginsburgh, M., & Garcia Gomez, P. (this
volume). Assessment design
issues in developing vertical scales for language
tests. In S. Papageorgiou & V. F. Manna (Eds.), Meaningful
language test scores: Research to enhance score
interpretation. John Benjamins.
Stocking, M. L., & Lord, F. M. (1983). Developing
a common metric in item response
theory. Applied Psychological
Measurement, 7, 201–210.
Yen, W. M. & Fitzpatrick, A. (2006). Item
response
theory. In R. L. Brennan (Ed.), Educational
measurement (4th
ed., pp. 111–153). American Council on Education, Praeger.