The cognitive processes elicited by L2 listening test tasks – A validation study
This paper is concerned with an investigation into the validity of a listening comprehension test that was developed for a large-scale assessment project. The study draws on qualitative data, employing a think-aloud technique and stimulated recall interviews. The informants (n=18) were purposefully and randomly sampled from a group (n=121) of year 9 learners (ages 14–16) of English as a foreign language (EFL) in German schools. Subjects were asked to think aloud while they were solving the multiple choice-items of the listening test. Construct-relevant and -irrelevant processes were identified and analysed with regard to their distribution across the two subsamples and their relative contribution to correct item responses. The results provide validity evidence for the listening tests in general. A few few test items, however, were shown to elicit test-taking processes and strategies that compromise the measurement outcomes.1. Introduction
References
Anderson, J.R
(
1995)
Cognitive psychology and its implications (4th ed.). New York, NY: W.H. Freeman and Company.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bachman, L.F
(
2004)
Statistical analyses for language testing. Cambridge: Cambridge University Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bachman, L.F
(
2005)
Building and supporting a case for test use.
Language Assessment Quarterly, 2, 1-34.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bloom, B
(
1954)
The thought processes of students in discussion. In
S.J. French (Ed.),
Accent on teaching: Experiments in general education (pp. 23-46). New York, NY: Harper.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Blumer, H
(
1954)
What is wrong with social theory? American Sociological Review, 18, 3–10.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Borsboom, D., Cramer, A.O.J., Kievit, R.A., Zand Scholten, A., & Franic, S
(
2009)
The end of construct validity. In
R.W. Lissitz (Ed.),
The concept of validity: Revisions, new directions, and applications (pp. 135-170). Charlotte, NC: Information Age Publishers.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Borsboom, D., VanHeerden, J., & Mellenbergh, G
(
2004)
The concept of validity.
Psychological Review, 111(4), 1061–1071.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Brindley, G
(
1998)
Describing language development? Rating scales and SLA. In
L.F. Bachman &
A.D. Cohen (Eds.),
Interfaces between second language acquisition and language testing research (pp. 112–141). Cambridge: Cambridge University Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bryman, A
(
2006)
Integrating quantitative and qualitative research: How is it done? Qualitative Research, 6(1), 97–113
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Buck, G
(
1991)
The testing of listening comprehension: An introspective study.
Language Testing, 8(1), 67-91.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Buck, G
(
1992)
Listening comprehension: Construct validity and trait characteristics.
Language Learning, 42(3), 313-357.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Buck, G
(
2001)
Assessing listening. Cambridge: Cambridge University Press.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Campbell, D.T., & Fiske, D.W
(
1959)
Convergent and discriminant validation by the multitrait-multimethod matrix.
Psychological Bulletin, 56, 81-105.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Chapelle, C.A
(
1998)
Construct definition and validity inquiry in SLA and research. In
L.F. Bachman &
A.D. Cohen (Eds.),
Interfaces between second language acquisition and language testing research (pp. 32-70). Cambridge: Cambridge University Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cohen, A.D
(
1998)
Strategies and processes in test taking and SLA. In
L.F. Bachman &
A.D. Cohen (Eds.)
Interfaces between second language acquisition and language testing research (pp. 90-111). Cambridge: Cambridge University Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cohen, A.D
(
2000)
Exploring strategies in test-taking: Fine-tuning verbal reports from respondents. In
G. Ekbatani &
H. Pierson (Eds.),
Learner-directed assessment in ESL (pp. 127-150). Mahwah, NJ: Lawrence Erlbaum Associates.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cohen, A.D
(
2007)
The coming of age for research on test-taking strategies. In
J. Fox,
M. Weshe,
D. Bayliss,
L. Cheng,
C. Turner, &
C. Doe (Eds.),
Language testing reconsidered (pp. 80-111). Ottawa: Ottawa University Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Creswell, J.W., Plano Clark, V.L., Gutmann, M.L., & Hanson, W.E
(
2007)
Advanced mixed methods research designs. In
A. Tashakkori &
C. Teddlie (Eds.),
Handbook of mixed methods in social & behavioral research (pp. 209–240). Thousand Oaks, CA: Sage.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cronbach, L.J., & Meehl, P.E
(
1955)
Construct validity in psychological tests.
Psychological Bulletin 52(1), 281-302.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Denzin, N.K
(
1970)
The research act: A theoretical introduction to sociological methods. Englewood Cliffs, NJ: Prentice Hall.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Denzin, N.K., & Lincoln, Y.S
(Eds.) (
2000)
The handbook of qualitative research (2nd ed.). Thousand Oaks, CA: Sage.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
DESI-Konsortium
(Ed.) (
2008)
Unterricht und Kompetenzerwerb in Deutsch und Englisch: Ergebnisse der DESI-Studie. Weinheim: Beltz.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Di Pardo, A
(
1994)
Stimulated recall in research on writing: An antidote to "I don't know, it was fine". In
P. Smagorinsky (Ed.),
Speaking about writing: Reflections on research methodology (pp. 163-184). Thousand Oaks, CA: Sage.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ericsson, K.A., & Simon, H.A
(
1993)
Protocol analysis: Verbal reports as data (Rev. ed.). Cambridge, MA: The MIT Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ericsson, K.A
(
2003)
Valid and non-reactive verbalisation of thoughts during performance of tasks: Toward a solution to the central problems of introspection as a source of scientific data.
Journal of Consciousness Studies, 10(9-10), 1-18.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Friese, M., & Fiedler, K
(
2010)
Being on the lookout for validity.
Experimental Psychology, 57(3), 228-232.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gass, S.M., & Mackey, A
(
2000)
Stimulated recall methodology in second language research. Mahwah, NJ: Lawrence Erlbaum Associates.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gernsbacher, M.A., & Foertsch, J.A
(
1999)
Three models of discourse comprehension. In
S. Garrod, &
M.J. Pickering (Eds.),
Language processing (pp. 283–299). Hove: Psychology Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Graesser, A.C., Gernsbacher, M.A., & Goldman, S.R
(
1997)
Cognition. In
T.A. van Dijk (Ed.),
Discourse studies. A multidisciplinary introduction. Vol.1 (pp. 292–319). Thousand Oaks, CA: Sage.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Graesser, A.C., Singer, M., & Trabasso, T
(
1994)
Constructing inferences during narrative text comprehension.
Psychological Review, 101, 371-95.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Graesser, A.C., Wiemer-Hastings, P., & Wiemer-Hastings, K
Green, A
(
1998)
Verbal protocol analysis in language testing research: A handbook. Cambridge: Cambridge University Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Grotjahn, R., & Eckes, T
(
2006)
A closer look at the construct validity of C-tests.
Language Testing, 23(3), 290–325
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Haastrup, K
(
1987)
Using thinking aloud and retrospection to uncover learners’ lexical inferencing procedures. In
C. Faerch &
G. Kasper (Eds.),
Introspection in second language research (pp. 197-212). Clevedon: Multilingual Matters.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Haladyna, T.M., & Downing, S.M
(
2004)
Construct-irrelevant variance in high-stakes testing.
Educational Measurement: Issues and Practice, 23(1), 17-27.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kane, M
(
2001)
Current concerns in validity theory.
Journal of Educational Measurement, 38(4), 319–342.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kasper, G
(
1998)
Analysing verbal protocols.
TESOL Quarterly, 32(2), 358–362.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kintsch, W
(
1998)
Comprehension. A paradigm for cognition. Cambridge: Cambridge University Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kintsch, W., Patel, V.L., & Ericsson, K.A
(
1999)
The role of long-term working memory in text comprehension.
Psychologia, 42, 186–198.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kintsch, W., & van Dijk, T.A
(
1978)
Toward a model of text comprehension and production.
Psychological Review, 85, 363–394.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kunnan, A.J
(
2000)
Fairness and justice for all. In
A.J. Kunnan (Ed.),
Fairness and validation in language assessment (pp. 1–14). Cambridge: Cambridge University Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lazarsfeld, P.F
(
1960)
Latent structure analysis and test theory
. In
H. Gulliksen &
S. Messick (Eds.),
Psychological scaling: Theory and applications (pp. 83—96). New York, NY: Wiley.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Levelt, W.J.M
(
1989)
Speaking: From intention to articulation. Cambridge, MA: The MIT Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lewins, A., & Silver, C
(
2009)
Using software in qualitative research: A step-by-step guide (Reprinted.). Los Angeles: Sage.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lord, F.M
(
1980)
Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Maxwell, J.A
(
2005)
Qualitative research design: An interactive approach. Thousand Oaks, CA: Sage.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
McKoon, G., & Ratcliff, R
(
1986)
Inferences about predictable events.
Journal of Experimental Psychology: Learning, Memory and Cognition, 12, 82-91.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
McNamara, T.F., & Roever, R
(
2006)
Language testing. The social dimension. Malden, MA: Blackwell.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Messick, S
(
1989)
Validity. In
R.L. Linn (Ed.),
Educational measurement (3rd ed.; pp. 13-103). New York, NY: American Council on Education & Macmillan.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Messick, S
(
1992)
Validity of test interpretation and use. In
M.C. Alkin (Ed.),
Encyclopedia of educational research (pp. 88-98). New York, NY: Macmillan.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Messick, S
(
1996)
Validity and washback in language testing.
Language Testing, 13(3), 241–256.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Miles, M.B., & Huberman, A.M
(
2009)
Qualitative data analysis: An expanded sourcebook. Thousand Oaks, CA: Sage.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mislevy, R.J
(
1996)
Test theory reconceived.
Journal of Educational Measurement, 33(4), 379-416.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Nold, G., & Rosssa, H
(
2007)
Hörverstehen. In
B. Beck &
E. Klieme (Eds.),
Sprachliche Kompetenzen. Konzepte und Messung - DESI-Studie (Deutsch-Englisch-Schülerleistungen International) (pp. 178-196). Weinheim: Beltz.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Nold, G., Rossa, H., & Hartig, J
(
2008)
Proficiency scaling in DESI listening and reading EFL tests: Task characteristics, item difficulty and cut-off points. In
L. Taylor &
C.J. Weir (Eds.),
Multilingualism and assessment. Achieving transparency, assuring quality, sustaining diversity. Proceedings of the ALTE Berlin conference,
May 2005
(pp. 94–116). Cambridge: Cambridge University Press.
O'Malley, M., & Chamot, A.U
(
1990)
Learning strategies in second language acquisition. Cambridge: Cambridge University Press.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Patton, M.Q
(
2002)
Qualitative evaluation and research methods. Thousand Oaks, CA: Sage.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Pearson, K
(
1900)
On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can reasonably be supposed to have arisen from random sampling.
Philosophical Magazine, 5(50), 157-175.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Pienemann, M
(
2003)
Language processing capacity. In
C. Doughty &
M. Long (Eds.),
The handbook of second language acquisition (pp. 679-714). Oxford: Blackwell.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Pienemann, M., & Keßler, J.-U
(
2007)
Measuring bilingualism. In
P. Auer &
L. Wei (Eds.),
Handbooks of applied linguistics: Handbook of multilingualism and multilingual communication (pp. 247–278). Berlin: Mouton de Gruyter.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rasch, G
(
1960)
Probabilistic models for some intelligence and attainment tests. Copenhagen: Danmarks pædagogiske Institut.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rasch, G
(
1961)
On general laws and the meaning of measurement in psychology. Berkeley, CA: University of California Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rasch, G
(
1968)
An individualistic approach to item analysis
. In
P.F. Lazarsfeld &
N.W. Henry (Eds.),
Readings in mathematical social science (pp. 89–107). Cambridge, MA: The MIT Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rasch, G
(
1980)
Probabilistic models for some intelligence and attainment tests (exp. ed.). Chicago, IL: University of Chicago Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Richards, K
(
2003)
Qualitative inquiry in TESOL. Houndmills: Palgrave.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rogers, T.T., & McClelland, J.L
(
2008)
Précis of semantic cognition: A parallel distributed processing approach.
Behavioral and Brain Sciences, 31(06), 689–714
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Roser, M., & Gazzaniga, M.S
(
2004)
Automatic brains - Interpretive minds.
Current Directions in Psychological Science, 13(2), 56–59.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ross, S
(
1997)
An introspective analysis of listener inferencing on a second language listening test. In
G. Kasper &
E. Kellerman (Eds.),
Communication strategies: Psycholinguistic and sociolinguistic perspectives (pp. 216-237). Harlow: Addison Wesley Longman.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rossa, H
(
2012)
Mentale Prozesse beim Hörverstehen in der Fremdsprache. Eine Studie zur Validität der Messung sprachlicher Kompetenzen (
Inquiries in Language Learning, Volume 5). Frankfurt: Peter Lang
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rost, M
(
2002)
Teaching and researching listening. Harlow: Pearson Education.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rupp, A., Ferne, T., & Choi, H
(
2006)
How assessing reading comprehension with multiple-choice questions shapes the construct: A cognitive processing perspective.
Language Testing, 23(4), 441–474
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Senécal, A
(
2011)
Processing the L2 comprehension process: Testing Processability Theory’s predictions in an ERP study of adult learners of L2 Swedish. Master’s thesis, Lund University. Accessed from:
[URL]![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Selting, M., Auer, P., Barden, B., Bergmann, J.R., Couper-Kuhlen, E., & Günthner, S. et al.
(
1998)
Gesprächsanalytisches Transkriptionssystem (GAT).
Linguistische Berichte, 173, 91–122
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Shohamy, E
(
2001)
The power of tests: A critical perspective on the uses of language tests. Harlow: Pearson.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Smith, J.A
(Ed.) (
2003)
Qualitative psychology: A practical guide to research methods. London: Sage.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Stoynoff, S
(
2009)
Recent developments in language assessment and the case of four large-scale tests of ESOL ability.
Language Teaching, 42(1), 1–40.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tirkkonen-Condit, S
(
1991)
Relational propositions in text comprehension processes. In
K. Sajavaara (Ed.),
Communication and discourse across cultures and languages. In AFinLA Yearbook 1990 (pp. 239–246). Jyväskylä: University of Jyväskylä.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Van der Veen, A., Huff, K., Gierl, M., McNamara, D.D., Louwerse, M., & Graesser, A
(
2007)
Developing and validating instructionally relevant reading competency profiles measured by the critical reading section of the SAT reasoning test. In
D.S. McNamara (Ed.),
Reading comprehension strategies. Theories, interventions, and technologies (pp. 137-172J). New York, NY: Lawrence Erlbaum Associates.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Van Someren, M.W., Barnard, Y.F., & Sandberg, J.A
(
1994)
The think aloud method: A practical guide to modelling cognitive processes. London: Academic Press.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Vandergrift, L
(
2003)
Orchestrating strategy use: Toward a model of the skilled second language listener.
Language Learning, 53(3), 463-496.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Webb, E., Campbell, D.T., Schwartz, R.D., & Sechrest, L
(
1962)
Unobtrusive measures: Nonreactive measures in the social sciences. Chicago, IL: Rand McNally.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Weir, C.J
(
2005)
Language testing and validation. An evidence-based approach. Houndmills: Palgrave.
![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Yin, R.K
(
2003)
Case study research: Design and methods (3rd ed.). Thousand Oaks, CA: Sage.
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cited by
Cited by 1 other publications
Folkerts, Jens-Folkert & Frauke Matz
2024.
The Challenge of Learning to Listen—Insights into a Design-Based Research Study in German EFL Secondary Education. In
Oracy in English Language Education [
English Language Education, 36],
► pp. 125 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
This list is based on CrossRef data as of 24 june 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.