American Council on the Teaching of Foreign Languages
(2012) ACTFL proficiency guidelines. Retrieved on 6 February 2023 from [URL]
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education
(2014) Standards for educational and psychological testing. American Educational Research Association. Retrieved on 6 February 2023 from [URL]
Angoff, W. H.
(1971) Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508–600). American Council on Education.Google Scholar
Bachman, L. F., & Palmer, A. S.
(2010) Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford University Press.Google Scholar
Beaton, A., & Allen, N.
(1992) Interpreting scales through scale anchoring. Journal of Educational Statistics, 17(2), 191–204. DOI logoGoogle Scholar
Centre for Canadian Language Benchmarks
(2012) Canadian language benchmarks: English as a second language for adults. Retrieved on 6 February 2023 from [URL]
Chapelle, C. A.
(2008) The TOEFL® validity argument. In C. Chapelle, M. Enright, & J. Jamieson (Eds.), Building a validity argument for the Test of English as a Foreign Language (pp. 319–352). Routledge.Google Scholar
(2020) Argument-based validation in testing and assessment. Sage.Google Scholar
Cho, Y., Ginsburgh, M., Morgan, R., Moulder, B., Xi, X., & Hauck, M. C.
(2016) Designing the TOEFL® Primary™ tests (Research Memorandum No. RM–16–02). ETS. Retrieved on 6 February 2023 from [URL]Google Scholar
Council of Europe
(2001) The Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press.
(2010) Comparing TOEFL® and IELTS™ total scores. Retrieved on 10 March 2023 [URL]
(2020) TOEFL® research insight series: Vol. 1. TOEFL iBT® test framework and test development. Retrieved on 6 February 2023 from [URL]
Fulcher, G.
(2016) Standards and frameworks. In D. Tsagari & J. Banerjee (Eds.), Handbook of second language assessment (pp. 29–44). De Gruyter Mouton. DOI logoGoogle Scholar
Garcia Gomez, P., Noah, A, Schedl, M., Wright, C., & Yolkut, A.
(2007) Proficiency descriptors based on a scale-anchoring study of the new TOEFL iBT reading test. Language Testing, 24(3), 417–435. DOI logoGoogle Scholar
Harris, D. J.
(2007) Practical issues in vertical scaling. In N. J. Dorans, M. Pommerich & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 233–251). Springer. DOI logoGoogle Scholar
Kane, M.
(2013) Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. DOI logoGoogle Scholar
Kane, M. T.
(1992) An argument-based approach to validity. Psychological Bulletin, 112(3), 527–535. DOI logoGoogle Scholar
Kolen, M. J.
(2006) Scaling and norming. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 156–186). Praeger.Google Scholar
Liao, C.-W.
(2010) TOEIC® Listening and Reading Test scale anchoring study (ETS Rep. TC–10–05). ETS. Retrieved on 6 February 2023 from [URL]Google Scholar
Lord, F. M.
(1980) Applications of Item Response Theory to practical testing problems. Lawrence Erlbaum Associates.Google Scholar
Messick, S.
(1989) Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan.Google Scholar
(1996) Validity and washback in language testing. Language Testing, 13(3), 241–256. DOI logoGoogle Scholar
Milanovic, M., & Weir, C. J.
(2010) Series editors’ note. In W. Martyniuk (Ed.), Relating language examinations to the Common European Framework of Reference for Languages: Case studies and reflections on the use of the Council of Europe’s Draft Manual (pp. viii–xx). Cambridge University Press.Google Scholar
National Education Examinations Authority
(2018) China’s standards of English language ability. Retrieved on 6 February 2023 from [URL]
Papageorgiou, S., & Cho, Y.
(2014) An investigation of the use of TOEFL Junior Standard scores for ESL placement decisions in secondary education. Language Testing, 31(2), 223–239. DOI logoGoogle Scholar
Papageorgiou, S., Davis, L., Norris, J. M., Garcia Gomez, P., Manna, V. F., & Monfils, L.
(2021) Design framework for the TOEFL® Essentials™ test 2021 (Research Memorandum No. RM–21–03). ETS. Retrieved on 6 February 2023 [URL]Google Scholar
Papageorgiou, S., & Manna, V. F.
(2021) Maintaining access to a large-scale test of academic language proficiency during the pandemic: The launch of TOEFL iBT Home Edition. Language Assessment Quarterly, 18(1), 36–41. DOI logoGoogle Scholar
Papageorgiou, S., & Tannenbaum, R. J.
(2016) Situating standard setting within argument-based validity. Language Assessment Quarterly, 13(2), 109–123. DOI logoGoogle Scholar
Papageorgiou, S., Wu, S., Hsieh, C.-N., Tannenbaum, R. J., & Cheng, M. M.
(2019) Mapping the TOEFL iBT® test scores to China’s standards of English language ability: Implications for score interpretation and use (Research Report No. TOEFL-RR–89). ETS. DOI logoGoogle Scholar
Powers, D., Schedl, M., & Papageorgiou, S.
(2017) Facilitating the interpretation of English language proficiency scores: Combining scale anchoring and test score mapping methodologies. Language Testing, 34(2), 175–195. DOI logoGoogle Scholar
Ryan, J.
(2006) Practices, issues, and trends in student test score reporting. In S. Downing & T. Haladyna (Eds.), Handbook of test development (pp. 677–710). Lawrence Erlbaum Associates.Google Scholar
So, Y., Wolf, M. K., Hauck, M. C., Mollaun, P., Rybinski, P., Tumposky, D., & Wang, L.
(2015) TOEFL Junior® design framework (Research Report No. RR–15–13). ETS. DOI logoGoogle Scholar
Zwick, R., Senturk, D., Wang, J., & Loomis, S. C.
(2001) An investigation of alternative methods for item mapping in the National Assessment of Educational Progress. Educational Measurement: Issues and Practice, 20(2), 15–25. DOI logoGoogle Scholar