How much vocabulary is needed to use a concordance?
Vocabulary load is a predictor of comprehension and a common concern in relation to learner use of concordances; however, vocabulary load figures for whole texts have limited relevance to learner use of concordances. This paper explores the average vocabulary load of the citations (or lines) in a concordance, reflecting how learners use concordances as reading or reference resources. Non-parametric tests are used to compare the vocabulary loads of citations from three authentic written corpora and a corpus of graded readers. The results indicate that citations from authentic corpora have an average vocabulary load of 4,000–5,000 word families, there are reliable differences in vocabulary load between citations from different corpora, and the magnitude of difference between citations from authentic corpora can be equivalent to the magnitude of difference between authentic corpora and graded reader corpora. The paper concludes with a discussion of the results in relation to language learner use of concordances.
Article outline
- 1.Introduction
- 2.The vocabulary demands of learner use of concordances
- 2.1Vocabulary and reading
- 2.2Concordances, reading and vocabulary load
- 3.Data and method
- 3.1Corpora
- 3.2Expanded word lists
- 3.3Procedure
- 3.3.1Power analysis
- 3.3.2Developing the sampling frame
- 3.3.3Scoring
- Vocabulary coverage level
- Mean word frequency
- 3.3.4Extraction software
- 4.Results
- 4.1Study one: Three authentic corpora
- 4.2Study two: Replication
- 4.3Study three: Authentic corpora compared with a graded corpus
- 5.Pedagogical implications and discussion
- 6.Conclusions
- Acknowledgements
- Note
-
References
References (67)
References
Allan, R. (2009). Can a graded reader corpus provide ‘authentic’ input? ELT Journal, 63(1), 23–32.
Allan, R. (2010). Concordances versus dictionaries: Evaluating approaches to word learning in ESOL. In R. Chacón-Beltrán, C. Abello-Contesse, & M. D. M. Torreblanca-López (Eds.), Insights into Non-native Vocabulary Teaching and Learning (pp. 112–125). Bristol: Multilingual Matters.
Baayen, R. H. (2001). Word Frequency Distributions. Dordrecht: Kluwer Academic.
Ballance, O. J. (2017). Pedagogical models of concordance use: Correlations between concordance user preferences. Computer Assisted Language Learning, 30(3–4), 259–283.
Bauer, L., & Nation, I. S. P. (1993). Word families. International Journal of Lexicography, 6(4), 253–279.
Bernardini, S. (2000). Systematising serendipity: Proposals for concordancing large corpora with language learners. In L. Burnard & T. McEnery (Eds.), Rethinking Language Pedagogy from a Corpus Perspective (pp. 225–235). Frankfurt am Main: Peter Lang.
Bernardini, S. (2002). Exploring new directions for discovery learning. In B. Kettemann & G. Marko (Eds.), Teaching and Learning by Doing Corpus Analysis (pp. 165–182). Amsterdam: Rodopi.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.
BNC-Consortium. (2001). The British National Corpus, version 2 (BNC World). Distributed by Oxford University Computing Services. Retrieved from [URL] (last acccessed November 2019).
Boulton, A., & Cobb, T. (2017). Corpus use in language learning: A meta-analysis. Language Learning, 67(2), 348–393.
Chambers, A., & O’Sullivan, I. (2004). Corpus consultation and advanced learners: Writing skills in French. ReCALL, 16(1), 158–172.
Charles, M. (2011). Using hands-on concordancing to teach rhetorical functions: Evaluation and implications for EAP writing classes. In A. Frankenberg-Garcia, L. Flowerdew, & G. Aston (Eds.), New Trends in Corpora and Language Learning (pp. 26–43). London: Continuum.
Chujo, K., Oghigian, K., & Akasegawa, S. (2015). A corpus and grammatical browsing system for remedial EFL learners. In A. Leńko-Szymańska & A. Boulton (Eds.), Multiple Affordances of Language Corpora for Data-driven Learning (pp. 109–128). Amsterdam: John Benjamins.
Cobb, T. (1997). Is there any measurable learning from hands-on concordancing? System, 25(3), 301–315.
Cobb, T. (1999). Breadth and depth of lexical acquisition with hands-on concordancing. Computer Assisted Language Learning, 12(4), 345–360.
Cobb, T. (n.d.). Graded Reader Corpus. Retrieved from [URL] (last acccessed November 2019).
Coxhead, A., & Ballance, O. J. (2018). Learning through a corpus. In A. Burns & J. C. Richards (Eds.), The Cambridge Guide to Learning English as a Second Language (pp. 307–315). Cambridge: Cambridge University Press.
Coxhead, A., Demecheleer, M., & McLaughlin, E. (2016). The technical vocabulary of Carpentry: Loads, lists and bearings. TESOLANZ Journal, 241, 38–71.
Coxhead, A. & Wallis, R. (2012). TED talks, vocabulary and listening for EAP. TESOLANZ Journal, 201, 55–67.
Dang, T. N. Y., & Webb, S. (2014). The lexical profile of academic spoken English. English for Specific Purposes, 331, 66–76.
Davies, M. (2008–). The Corpus of Contemporary American English (COCA): 520 millions words, 1990-present. Retrieved from [URL] (last acccessed November 2019).
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191.
Franken, M. (2014). The nature and scope of student search strategies in using a web derived corpus for writing. The Language Learning Journal, 42(1), 85–102.
Frankenberg-Garcia, A. (2014). How language learners can benefit from corpora, or not. Recherches en didatique des langues et des cultures: les cahiers de l’acedle, 11(1), 93–110.
Grabe, W., & Stoller, F. L. (2011). Teaching and Researching Reading (2nd ed.). Harlow: Longman/Pearson.
Hadley, G., & Charles, M. (2017). Enhancing extensive reading with data-driven learning. Language Learning & Technology, 21(3), 131–152.
Hsu, W. (2011). The vocabulary thresholds of business textbooks and business research articles for EFL learners. English for Specific Purposes, 30(4), 247–257.
Hsu, W. (2014). Measuring the vocabulary load of engineering textbooks for EFL undergraduates. English for Specific Purposes, 331, 54–65.
Hu, M., & Nation, I. S. P. (2000). Unknown vocabulary density and reading. Reading in a Foreign Language, 13(1), 403–430.
Hyland, K. (2015). Corpora and written academic English. In D. Biber & R. Reppen (Eds.), The Cambridge Handbook of English Corpus Linguistics (pp. 292–308). Cambridge, UK: Cambridge University Press.
Johns, T. (1991). Should you be persuaded: Two samples of data-driven learning materials. English Language Research Journal, 41, 1–16.
Johns, T. (2002). Data-driven learning: The perpetual challenge. In B. Kettemann & G. Marko (Eds.), Teaching and Learning by Doing Corpus Analysis: Proceedings of the Fourth International Conference on Teaching and Language Corpora, Graz 19–24 July, 2000 (pp. 107–117). Amsterdam: Rodopi.
Kennedy, C., & Miceli, T. (2001). An evaluation of intermediate students’ approaches to corpus investigation. Language Learning and Technology, 5(3), 77–90.
Kennedy, C., & Miceli, T. (2010). Corpus-assisted creative writing: Introducing intermediate Italian learners to a corpus as a reference resource. Language Learning & Technology, 14(1), 28–44.
Kennedy, C., & Miceli, T. (2016). Cultivating effective corpus use by language learners. Computer Assisted Language Learning, 30(1–2), 1–24.
Kennedy, G. (1998). An Introduction to Corpus Linguistics. London, UK: Longman.
Kilgarriff, A., Husák, M., McAdam, K., Rundell, M., & Rychlý, P. (2008, 15–19 July). GDEX: Automatically finding good dictionary examples in a corpus. Paper presented at the 13th EURALEX, Barcelona, Spain.
Kilgarriff, A., Marcowitz, F., Smith, S., & Thomas, J. (2015). Corpora and language learning with the Sketch Engine and SKELL. Revue française de linguistique appliquée, 20(1), 61–80.
Larson-Hall, J. (2010). A Guide to Doing Statistics in Second Language Research Using SPSS. New York, NY: Routledge.
Laufer, B., & Ravenhorst-Kalovski, G. (2010). Lexical threshold revisited: Lexical text coverage, learners’ vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15–30.
Lee, D. (2002). Genres, registers, text types, domains and styles: Clarifying the concepts and navigating a path through the BNC jungle. In B. Kettemann & G. Marko (Eds.), Teaching and Learning by Doing Corpus Analysis: Proceedings of the Fourth International Conference on Teaching and Language Corpora, Graz 19–24 July, 2000 (pp. 247–292). Amsterdam: Rodopi.
Lee, H., Warschauer, M., & Lee, J. H. (2018). The effects of corpus use on second language vocabulary learning: A multilevel meta-analysis. Applied Linguistics, 40(5), 721–753.
Nation, I. S. P. (2006). How large a vocabulary is needed for reading and listening? The Canadian Modern Language Review / La revue canadienne des langues vivantes, 63(1), 59–81.
Nation, I. S. P. (2012). Range program with BNC/COCA lists 25,000 words. Retrieved from [URL] (last acccessed November 2019).
Nation, I. S. P. (2013). Learning Vocabulary in Another Language (2nd ed.). Cambridge: Cambridge University Press.
Nation, I. S. P., & Webb, S. (2011). Researching and Analyzing Vocabulary. Boston, MA: Heinle.
Python Software Foundation. (2001–2019). Python (Version 2.7) [Computer software]. Retrieved from [URL] (last accessed November 2019).
Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372–422.
Rayner, K. (2009). Eye movements and attention in reading, scene perception, and visual search. The Quarterly Journal of Experimental Psychology, 62(8), 1457–1506.
Rayson, P. (2015). Computational tools and methods for corpus compilation and analysis. In D. Biber & R. Reppen (Eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press.
Schmitt, N., Jiang, X., & Grabe, W. (2011). The percentage of words known in a text and reading comprehension. The Modern Language Journal, 95(1), 26–43.
Sinclair, J. (2003). Reading Concordances: An Introdcution. London: Pearson/Longman.
Sorell, J. (2015). Word frequencies. In J. R. Taylor (Ed.), The Oxford Handbook of the Word (pp. 68–88). Oxford: Oxford University Press.
Swan, M., & Walter, C. (2017). Misunderstanding comprehension. ELT Journal, 71(2), 228–236.
Tegge, F. (2017). The lexical coverage of popular songs in English language teaching. System, 671, 87–98.
Tono, Y., Satake, Y., & Miura, A. (2014). The effects of using corpora on revision tasks in L2 writing with coded error feedback. ReCALL, 26(2), 147–162.
Webb, S., & Macalister, J. (2013). Is text written for children useful for L2 extensive reading? TESOL Quarterly, 47(2), 300–322.
Webb, S., & Rodgers, M. (2009a). The lexical coverage of movies. Applied Linguistics, 30(3), 407–427.
Webb, S., & Rodgers, M. (2009b). Vocabulary demands of television programs. Language Learning, 59(2), 335–366.
Wible, D., Chien, F.-Y., Kuo, C.-H., & Wang, C. C. (2002). A lexical difficulty filter for language learners. In B. Kettemann & G. Marko (Eds.), Teaching and Learning by Doing Corpus Analysis: Proceedings of the Fourth International Conference on Teaching and Language Corpora, Graz 19–24 July, 2000 (pp. 147–154). Amsterdam: Rodopi.
Widdowson, H. G. (1998). Context, community, and authentic language. TESOL Quarterly, 32(4), 705–716.
Yoon, H. (2008). More than a linguistic reference: The influence of corpus technology on L2 academic writing. Language Learning & Technology, 12(2), 31–48.
Yoon, H., & Hirvela, A. (2004). ESL student attitudes toward corpus use in L2 writing. Journal of Second Language Writing, 13(4), 257–283.
Cited by (3)
Cited by three other publications
Ballance, Oliver James
2021.
Narrow reading, vocabulary load and collocations in context: Exploring lexical repetition in concordances from a pedagogical perspective.
ReCALL 33:1
► pp. 4 ff.
Crosthwaite, Peter, Luciana & Martin Schweinberger
2021.
Voices from the periphery: Perceptions of Indonesian primary vs secondary pre-service teacher trainees about corpora and data-driven learning in the L2 English classroom.
Applied Corpus Linguistics 1:1
► pp. 100003 ff.
This list is based on CrossRef data as of 11 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.