This paper explores the effectiveness of Juilland’s D as a measure of vocabulary dispersion in large corpora. Through a series of experiments using the BNC, we explored the influence of three variables: the number of corpus-parts used for the computation of D, the frequency of the target word, and the distributions of those words. The experiments demonstrate that the effective range for D is greatly reduced when computations are based on a large number of corpus-parts: even words with highly skewed distributions have D values indicating a relatively uniform distribution. We also briefly explore an alternative measure, Gries’ DP (Gries 2008), showing that it is a more reliable and effective measure of dispersion in a large corpus divided into many parts. In conclusion, we discuss the implications of these findings for quantitative methods applied to the creation of vocabulary lists as well as research questions in other areas of corpus linguistics.
(Eds.) (2016) Triangulating Methodological Approaches in Corpus-linguistic Research. New York, NY: Routledge.
Biber, D
(2012) Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory, 8(1), 9–37.
Biber, D., Egbert, J., Gray, B., Oppliger, R., & Szmrecsanyi, B
Forthcoming). Variationist versus text-linguistic approaches to grammatical change in English: Nominal modifiers of head nouns. In M. Kytö & P. Pahta (Eds.) Cambridge Handbook of English Historical Linguistics Cambridge Cambridge University Press
Brezina, V., & Gablasova, D
(2015) Is there a core general vocabulary? Introducing the New General Service List. Applied Linguistics, 36(1), 1–22.
Coxhead, A
(2000) A new academic word list. TESOL Quarterly, 34(2), 213–238.
Davies, M., & Gardner, D
(2010) A Frequency Dictionary of Contemporary American English: Word Sketches, Collocates, and Thematic Lists. London: Routledge.
Evert, S
(2004) The statistics of word co-occurrences: Word pairs and collocations (Unpublished doctoral dissertation). University of Stuttgart, Germany. Retrieved from [URL] (last accessed September 2016).
Gardner, D., & Davies, M
(2014) A new academic vocabulary list. Applied Linguistics, 34(5), 1–24.
2022. Toward more careful corpus statistics: uncertainty estimates for frequencies, dispersions, association measures, and more. Research Methods in Applied Linguistics 1:1 ► pp. 100002 ff.
Th. Gries, Stefan
2020. Analyzing Dispersion. In A Practical Handbook of Corpus Linguistics, ► pp. 99 ff.
Grindrod, Jumbly
2022. Justification: Insights from Corpora. Episteme► pp. 1 ff.
Jakobsen, Anne Sofie, Averil Coxhead & Birgit Henriksen
2018. General and academic high frequency vocabulary in Danish. Nordand 13:1 ► pp. 64 ff.
2022. Technical vocabulary in languages for special purposes: The corpus-based Russian economics word list. Lingua 273 ► pp. 103326 ff.
McGrath, Darby & Cassi Liardét
2022. A corpus-assisted analysis of grammatical metaphors in successful student writing. Journal of English for Academic Purposes 56 ► pp. 101090 ff.
Miller, Don
2020. Analysing Frequency Lists. In A Practical Handbook of Corpus Linguistics, ► pp. 77 ff.
Miller, Don
2022. Replication as a means of assessing corpus representativeness and the generalizability of specialized word lists. Applied Corpus Linguistics 2:3 ► pp. 100027 ff.
Nelson, Robert N.
2023. Too Noisy at the Bottom: Why Gries’ (2008, 2020) Dispersion Measures Cannot Identify Unbiased Distributions of Words. Journal of Quantitative Linguistics 30:2 ► pp. 153 ff.
Omidian, Taha & Anna Siyanova-Chanturia
2021. Parameters of variation in the use of words in empirical research writing. English for Specific Purposes 62 ► pp. 15 ff.
Posch, Claudia
2023. Half-Witted or Hard-Working-Fun-Loving Women? – A Corpus-Assisted Study of Gendered Collocation in the New Zealand Alpine Club Journal Corpus. Zeitschrift für Anglistik und Amerikanistik 71:3 ► pp. 241 ff.
Qian, Yubin
2022. A stylometric approach to the interdiscursivity of professional practice. Humanities and Social Sciences Communications 9:1
Serigos, Jacqueline
2022. Using automated methods to explore the social stratification of anglicisms in Spanish. Corpus Linguistics and Linguistic Theory 18:2 ► pp. 391 ff.
This list is based on CrossRef data as of 29 february 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.