References (47)
References
Bock, R. D., & Zimowski, M. F. (1997). Multiple Group IRT. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 433–448). Springer. DOI logoGoogle Scholar
Braun, H. I. (1988). A new approach to avoiding problems of scale in interpreting trends in mental measurement data. Journal of Educational Measurement, 25(3), 171–191. DOI logoGoogle Scholar
Briggs, D. C., & Domingue, B. (2013). The gains from vertical scaling. Journal of Educational and Behavioral Statistics, 38(6), 551–576. DOI logoGoogle Scholar
Briggs, D. C., & Weeks J. P. (2009). The impact of vertical scaling decisions on growth interpretations. Educational Measurement: Issues and Practice, 28(4), 3–14. DOI logoGoogle Scholar
Carlson, J. E. (2010). Statistical models for vertical linking. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 59–70). Springer.Google Scholar
Crocker, L., & Algina, J. (1986). Introduction to modern and classical test theory. Holt, Rinehart, and Winston.Google Scholar
Deng, W., & Monfils, L. (2017). Long-term impact of valid case criterion on capturing population-level growth under item response theory equating (ETS Research Report Series No. RR–17–17). ETS. DOI logoGoogle Scholar
Haberman, S. J. (2012). A general program for item-response analysis that employs the stabilized Newton-Raphson algorithm (Unpublished manuscript). ETS.Google Scholar
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149. DOI logoGoogle Scholar
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.Google Scholar
Hanson, B. A., & Beguin, A. A. (1999). Separate versus concurrent estimation of IRT item parameters in the common item equating design (ACT Research Report Series, 99–8). ACT.Google Scholar
(2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 3–24. DOI logoGoogle Scholar
Harris, D. J. (1991). A comparison of Angoff’s Design I and Design II for vertical equating using traditional and IRT methodology. Journal of Educational Measurement, 28(3), 221–235. DOI logoGoogle Scholar
(2007). Practical issues in vertical scaling. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 233–251). Springer. DOI logoGoogle Scholar
Harris, D. J., & Hoover, H. D. (1987). An application of the three-parameter IRT model to vertical equating. Applied Psychological Measurement, 11(2), 151–159. DOI logoGoogle Scholar
Holland, P. W. (2007). A framework and history for score linking. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 5–30). Springer. DOI logoGoogle Scholar
Hoskens, M., Lewis, D. M., & Patz, R. J. (2003). Maintaining vertical scalings using a common item design. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.Google Scholar
Ito, K., Sykes, R. C., & Yao, L. (2008). Concurrent and separate grade-groups linking procedures for vertical scaling. Applied Measurement in Education, 21(3), 187–206. DOI logoGoogle Scholar
Kenyon, D. M., MacGregor, D., Li, D., & Cook, H. G. (2011). Issues in vertical scaling of a K–12 English language proficiency test. Language Testing, 28(3), 383–400. DOI logoGoogle Scholar
Kim, S.-H., & Cohen, A. S. (1998). A comparison of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22(2), 131–143. DOI logoGoogle Scholar
Kolen, M. J. (1981). Comparison of traditional and item response theory methods of equating tests. Journal of Educational Measurement, 18(1), 1–11. DOI logoGoogle Scholar
(2006). Scaling and norming. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 156–186). American Council on Education; Praeger.Google Scholar
(2011). Issues associated with vertical scales for PARCC assessments. Retrieved on 6 February 2023 from [URL]Google Scholar
Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). Springer. DOI logoGoogle Scholar
Linn, R. L. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6(1), 83–102. DOI logoGoogle Scholar
Lord, F. M. (1975). The ‘ability’ scale in item characteristic curve theory. Psychometrika, 40(2), 205–217. DOI logoGoogle Scholar
Martineau, J. A. (2006). Distorting value added: The use of longitudinal, vertically scaled student achievement data for growth-based, value-added accountability. Journal of Educational and Behavioral Statistics, 31(1), 35–62. DOI logoGoogle Scholar
Masters, G. N., & Wright, B. D. (1997). The partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 101–122). Springer. DOI logoGoogle Scholar
McNamara, T. F. (1996). Measuring second language performance. Longman.Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. DOI logoGoogle Scholar
Patz, R. J., & Yao, L. (2007). Methods and models for vertical scaling. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 253–272). Springer. DOI logoGoogle Scholar
Peterson, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221–262). Macmillan.Google Scholar
Reckase, M. D. (2009). Multidimensional item response theory. Springer. DOI logoGoogle Scholar
(2010). Study of best practices for vertical scaling and standard setting with recommendations for FCAT 2.0. [URL]Google Scholar
Skaggs, G., & Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of Educational Research, 56(4), 495–529. DOI logoGoogle Scholar
(1988). Effect of examinee ability on test equating invariance. Applied Psychological Measurement, 12(1), 69–82. DOI logoGoogle Scholar
Slinde, J. A., & Linn, R. L. (1979). A note on vertical equating via the Rasch model for groups of quite different ability and tests of quite different difficulty. Journal of Educational Measurement, 16, 159–165. DOI logoGoogle Scholar
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210. DOI logoGoogle Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Lawrence Erlbaum Associates.Google Scholar
Tomkowicz, J., Zhang, L., & Yen, S. (2010). Comparison of vertical scaling maintenance methods and their impact on scale properties. Paper presented at the annual meeting of the National Council on Measurement in Education, Denver, CO.Google Scholar
Tong, Y., & Kolen, M. J. (2010). Scaling: An ITEMS module. Educational Measurement: Issues and Practice, 29(4), 39–48. DOI logoGoogle Scholar
von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307. DOI logoGoogle Scholar
Wu, R. Y., & Liao, C. H. (2010). Establishing a common score scale for the GEPT Elementary, Intermediate, and High-Intermediate Level listening and reading tests. In T. Kao & Y. Li (Eds.), A new look at language teaching and testing: English as subject and vehicle – Selected papers from the 2009 LTTC International Conference on English Language Teaching and Testing (pp. 309–329). Language Training and Testing Center.Google Scholar
Yen, W. M. (1986). The choice of scale for educational measurement: An IRT perspective. Journal of Educational Measurement, 23(4), 299–325. DOI logoGoogle Scholar
(2007). Vertical scaling and No Child Left Behind. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 273–283). Springer. DOI logoGoogle Scholar
Yen, W. M., & Fitzpatrick, A. (2006). Item response theory. In R. L. Brennan (Ed.), Educational measurement (4th Ed.) (pp. 111–153). American Council on Education, Praeger.Google Scholar
Young, M. J. (2006). Vertical scales. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 469–485). Lawrence Erlbaum Associates.Google Scholar