Corpus report
The English Language Learner Insight, Proficiency and Skills Evaluation (ELLIPSE) Corpus
This paper introduces the open-source English Language Learning Insight, Proficiency and Skills Evaluation
(ELLIPSE) corpus. The corpus comprises ~6,500 essays written by English language learners (ELLs). All essays were written during
state-wide standardized annual testing in the United States. The essays were written on 29 different independent prompts that
required no background knowledge on the part of the writer. Individual difference information is made available for each essay
including economic status, gender, grade level (8–12), and race/ethnicity. Each essay was scored by two trained human raters for
English language proficiency including an overall score of English proficiency and analytic scores for cohesion, syntax,
vocabulary, phraseology, grammar, and conventions. The paper provides reliability on the human judgments of proficiency reported
for the corpus. The ELLIPSE corpus addresses many of the concerns found in existing learner corpora including unique holistic and
analytic scores for each ELL essay. The corpus also includes limited demographic and individual difference data for each ELL.
Article outline
- 1.Introduction
- 2.The ELLIPSE Corpus
- 2.1Initial corpus
- 2.2Proficiency scoring
- 2.2Final ELLIPSE corpus
- 2.2.1Text statistics
- 2.2.2Meta-data
- 2.2.3Score distribution
- 3.Conclusion
- Open Material badge
- Notes
-
References
References (49)
References
Bachman, L. F., & Palmer, A. S. (1996). Language
testing in practice: Designing and developing useful language
tests (Vol. 11). Oxford University Press.
Bailey, A. L., & Kelly, K. R. (2010). The
use and validity of home language surveys in state English language proficiency assessment systems: A review and issues
perspective (Evaluating the Validity of English Language Proficiency Assessment). edCount, LLC Center of Assessment UCLA. [URL]
Birdsong, D. (2005). Interpreting
age effects in second language acquisition. In J. F. Kroll & A. M. B. de Groot (Eds.), Handbook
of bilingualism: Psycholinguistic
approaches (pp. 109–127). Oxford University Press.
Blanchard, D., Tetreault, J., Higgins, D., Cahill, A., & Chodorow, M. (2013). TOEFL11:
A corpus of non-native English. ETS Research Report Series 2013(2).
Boyd, A., Hana, J., Nicolas, L., Meurers, D., Wisniewski, K., Abel, A., Schöne, K., Štindlová, B., & Vettori, C. (2014). The
MERLIN corpus: Learner language and the CEFR. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings
of the Ninth International Conference on Language Resources and Evaluation
(LREC’14) (pp: 1281–1288). European Language Resources Association (ELRA).
Chapelle, C. A., Enright, M. K., & Jamieson, J. (Eds.) (2008). Building
a validity argument for the Test of English as a Foreign
Language. Routledge.
Cheng, W., & Warren, M. (2005). Peer
assessment of language proficiency. Language
Testing,
22
(1), 93–121. 

Choi, I. (2016). Efficacy
of an ICALL tutoring system and process-oriented corrective feedback. Computer Assisted
Language
Learning,
29
(2), 334–364. 

Chomsky, C. (1972). Stages
in Language Development and Reading Exposure. Harvard Educational
Review,
42
(1), 1–33. 

Clifford, R., & Cox, T. L. (2013). Empirical
validation of reading proficiency guidelines. Foreign Language
Annals,
46
(1), 45–61. 

Cohen, J. (1992). Statistical
Power Analysis. Current Directions in Psychological
Science,
1
(3), 98–101. 

Council of Europe (2001). Common European
Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press.
Crossley, S. A., Kyle, K., & McNamara, D. S. (2016). The
tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text
cohesion. Behavior Research
Methods,
48
(4), 1227–1237. 

Crossley, S. A., Kyle, K., & McNamara, D. S. (2017). Sentiment
Analysis and Social Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order
analysis. Behavior Research
Methods,
49
(3), 803–821. 

Crossley, S. A., & McNamara, D. S. (2010). Cohesion,
coherence, and expert evaluations of writing proficiency. In S. Ohlsson & R. Catrambone (Eds.), Proceedings
of the Annual Meeting of the Cognitive Science
Society (pp. 984–989). Cognitive Science Society.
Ellis, R. (1991). Grammatically
judgments and second language acquisition. Studies in Second Language
Acquisition,
13
(2), 161–186. 

Ellis, R. (2003). Task-based
language learning and teaching. Oxford University Press.
Figueras, N. (2012). The
impact of the CEFR. ELT
Journal,
66
(4), 477–485. 

Foddy, W. (1993). Constructing
questions for interviews and questionnaires: Theory and practice in social research. Cambridge University Press. 

Geertzen, J., Alexopoulou, T., & Korhonen, A. (2013). Automatic
linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database
(EFCAMDAT). In R. T. Miller, K. I. Martin, C. M. Eddington, A. Henery, N. Marcos Miguel, A. M. Tseng, A. Tuninetti, & D. Walter (Eds.), Proceedings
of the 31st Second Language Research Forum: Building Bridges Between
Disciplines (pp. 240–254). Cascadilla Proceedings Project.
Granena, G. (2019). Cognitive
aptitudes and L2 speaking proficiency: Links between LLAMA and Hi-LAB. Studies in Second
Language
Acquisition,
41
(2), 313–336. 

Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix:
Analysis of text on cohesion and language. Behavior Research Methods, Instruments, &
Computers,
36
(2), 193–202. 

Housen, A., & Kuiken, F. (2009). Complexity,
accuracy, and fluency in second language acquisition. Applied
linguistics,
30
(4), 461–473. 

Hymes, D. (1972). Editorial
Introduction to Language in Society. Language in
Society,
1
(1), 1–14. 

Ishikawa, S. I. (2013). The
ICNALE and sophisticated contrastive interlanguage analysis of Asian learners of
English. Learner Corpus Studies in Asia and the
World,
1
(1), 91–118.
Kim, A. Y. (2015). Exploring
ways to provide diagnostic feedback with an ESL placement test: Cognitive diagnostic assessment of L2 reading
ability. Language
Testing,
32
(2), 227–258. 

Kyle, K., & Crossley, S. A. (2018). Measuring
Syntactic Complexity in L2 Writing Using Fine-Grained Clausal and Phrasal Indices. The Modern
Language
Journal,
102
(2), 333–349. 

Kyle, K., Crossley, S., & Berger, C. (2018). The
tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behavior
Research
Methods,
50
(3), 1030–1046. 

Kyle, K., Crossley, S. A., & Jarvis, S. (2021). Assessing
the Validity of Lexical Diversity Indices Using Direct Judgements. Language Assessment
Quarterly,
18
(2), 154–170. 

Lagakis, P., & Demetriadis, S. (2021). Automated
essay scoring: A review of the field. 2021 International Conference on Computer, Information
and Telecommunication Systems (CITS), 1–6. 

Larsen-Freeman, D. (1978). An
ESL Index of Development. TESOL
Quarterly,
12
(4), 439–448. 

Laufer, B., & Nation, P. (1999). A
vocabulary-size test of controlled productive ability. Language
Testing,
16
(1), 33–51. 

Lim, G. S. (2011). The
development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced
raters. Language
Testing,
28
(4), 543–560. 

Linacre, J. M. (2021). A
User’s Guide to FACETS Rasch-Model Computer Programs. Program
Manual 3.83.5.
Lisken-Gasparro, J. E. (1984). The
ACTFL proficiency guidelines: Gateway to testing and curriculum. Foreign Language
Annals
17
(5), 475–489. 

Lumley, T. (1998). Perceptions
of language-trained raters and occupational experts in a test of occupational English language
profficiency. English for Specific
Purposes,
17
(4), 347–367. 

Lumley, T. (2005). Assessing
second language writing: The rater’s perspective. Peter Lang.
Meurers, D., De Kuthy, K., Nuxoll, F., Rudzewitz, B., & Ziai, R. (2019). Scaling
up intervention studies to investigate real-life foreign language learning in school. Annual
Review of Applied
Linguistics,
39
1, 161–188. 

McNamara, T., Knoch, U., Fan, J., & Rossner, R. (2019). Fairness,
justice & language assessment. Oxford University Press.
Ortega, L. (2012). Epilogue:
Exploring L2 writing–SLA interfaces. Journal of Second Language
Writing,
21
(4), 404–415. 

O’Sullivan, B. (2018). IELTS
(international English language testing system). In J. I. Liontas (Ed.
in Chief), The TESOL Encyclopedia of English Language
Teaching (pp. 1–8). Wiley. 

Plonsky, L. (2023). Sampling
and Generalizability in Lx Research: A Second-Order
Synthesis. Languages
8
(1), 751, 1–13. 

Skehan, P. (1989). Individual
differences in second-language learning. Edward Arnold.
U.S. Department of Education. (2017). Our
nation’s English learners. US Department of Education. [URL]
Weigle, S. C. (2004). Integrating
reading and writing in a competency test for non-native speakers of English. Assessing
Writing,
9
(1), 27–55. 

Widdowson, H. G. (1983). Learning
purpose and language use. Oxford University Press.
Wood, C., & Schatschneider, C. (2021). Examining
Writing Measures and Achievement for Students of Varied Language Abilities and
Linguistic. Reading and Writing
Quarterly,
37
(1), 65–81. 

Cited by (3)
Cited by three other publications
Mahmoud, Somaia, Emad Nabil & Marwan Torki
2024.
Automatic Scoring of Arabic Essays: A Parameter-Efficient Approach for Grammatical Assessment.
IEEE Access 12
► pp. 142555 ff.

Paquot, Magali
2024.
Learner corpus research: a critical appraisal and roadmap for contributing (more) to SLA research agendas.
Corpus Linguistics and Linguistic Theory 20:3
► pp. 567 ff.

Thwaites, Peter, Nathan Vandeweerd & Magali Paquot
2024.
Crowdsourced Comparative Judgement for Evaluating Learner Texts: How Reliable are Judges Recruited from an Online Crowdsourcing Platform?.
Applied Linguistics 
This list is based on CrossRef data as of 19 november 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.