What do (most of) our dispersion measures measure (most)? Dispersion?

Gries, Stefan Th.

doi:10.1075/jsls.21029.gri

Article published In:

Journal of Second Language Studies
Vol. 5:2 (2022) ► pp.171–205

What do (most of) our dispersion measures measure (most)? Dispersion?

Stefan Th. Gries | University of California, Santa Barbara , USA | JLU Giessen

This paper discusses the degree to which most of the most widely-used measures of dispersion in corpus linguistics are not particularly valid in the sense of actually measuring dispersion rather than some amalgam of a lot of frequency and a little dispersion. The paper demonstrates these issues on the basis of data from a variety of corpora. I then outline how to design a dispersion measure that only measures dispersion and show that (i) it indeed measures information that is different from frequency in an intuitive way and (ii) has a higher degree of predictive power of lexical decision times from the MALD database than nearly all other measures in nearly all corpora tested.

Keywords: dispersion, frequency, association, range, Juilland’s D , Gries’s DP , generalized additive modeling

Article outline

1.Introduction
2.A brief recap: G ² reacts more to frequency than to association
3.Dispersion measure: What do they measure and how?
- 3.1Existing measures
- 3.2A new measure: Motivation and development
- 3.3Perspective 1: DP _nofreq measures dispersion, not frequency
- 3.4Perspective 2: DP _nofreq helps predicting external data
4.Two short excurses
- 4.1Excursus 1 range _nofreq
- 4.2Excursus 2: fast bowler vs. fast food
5.Concluding remarks
Notes
References

Published online: 30 November 2021

https://doi.org/10.1075/jsls.21029.gri

References (34)

Adelman, James S., Gordon D. A. Brown, & José F. Quesada

2006 Contextual Diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science 19(9). 814–823.

Baayen, R. Harald

2008 Analyzing linguistic data: a practical introduction to statistics with R. Cambridge: Cambridge University Press.

2010 Demythologizing the word frequency effect: A discriminative learning perspective. The Mental Lexicon 5(3). 436–461.

Baayen, R. Harald, Petar Milin, & Michael Ramscar

2016 Frequency in lexical processing. Aphasiaology 30(11). 1174–1220.

Balota, David A. & Daniel H. Spieler

1998 The utility of item level analyses in model evaluation: a reply to Seidenberg and Plaut. Psychological Science 9(3). 238–240.

Bestgen, Yves & Sylviane Granger

2009 Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing 261. 28–41.

Brysbaert, Marc & Boris New

2009 Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41(4). 977–990.

Brysbaert, Marc, Pawel Mandera, Samantha F. McCormick, & Emmanuel Keuleers

2019 Word prevalence norms for 62,000 English lemmas. Behavior Research Methods 511. 467–479.

Carroll, John B.

1970 An alternative to Juilland’s usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behaviour 3(2). 61–65.

Durrant, Phil & Norbert Schmitt

2009 To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics 471. 157–177.

Ellis, Nick C.

2007a Language acquisition as rational contingency learning. Applied Linguistics 27(1). 1–24.

2007b The Associative-Cognitive CREED. In Bill VanPatten & Jessica Williams. (eds.), Theories of second language acquisition: an introduction, 77–95. Mahwah, NJ: Lawrence Erlbaum.

Ellis, Nick C., Rita Simpson-Vlach, & Carson Maynard

2008 Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly 42(3). 375–396.

Evert, Stefan

2009 Corpora and collocations. In Anke Lüdeling & Merja. Kytö. (eds.), Corpus Linguistics: An International Handbook, Vol. 21, 1212–1248. Berlin & New York: Mouton de Gruyter.

Fu, M. & Shaofeng, Li

2019 The associations between individual differences in working memory and the effectiveness of immediate and delayed corrective feedback. Journal of Second Language Studies 2(2). 233-257 (25)

Gries, Stefan Th.

2008 Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics 13(4). 403–437.

2010 Dispersions and adjusted frequencies in corpora: further explorations. In Stefan Th. Gries, Stefanie Wulff, & Mark Davies. (eds.), Corpus linguistic applications: current studies, new directions, 197–212. Amsterdam: Rodopi.

2019a Ten lectures on corpus-linguistic approaches: Applications for usage-based and psycholinguistic research. Leiden & Boston: Brill.

2019b 15 years of collostructions: some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics measures). International Journal of Corpus Linguistics 24(3). 385–412.

2020 Analyzing dispersion. In Magali Paquot & Stefan Th. Gries. (eds.), A practical handbook of corpus linguistics, 99–118. Berlin & New York: Springer.

Gries, Stefan, Th.

2021 What do (some of) our association measures measure (most)? Association? Journal of Second Language Studies. Available online: 12 November 2021.

Juilland, Alphonse G., Dorothy R. Brodin, & Catherine Davidovitch

1970 Frequency dictionary of French words. The Hague: Mouton de Gruyter.

Kromer, Victor

2003 An usage measure based on psychophysical relations. Journal of Quantitative Linguistics 10(2). 177–186.

Oakes, Michael P. & Malcolm Farrow

2007 Use of the Chi-Squared Test to examine vocabulary differences in English language corpora representing seven different countries. Literary and Linguistic Computing 22(1). 85–99.

Pecina, Pavel

2009 Lexical association measures and collocation extraction. Language Resources and Evaluation 44(1–2). 137–158.

Robertson, Stephen

2004 Understanding Inverse Document Frequency: on theoretical arguments of IDF. Journal of Documentation 60(5). 503–520.

Rosengren, Inger

1971 The quantitative concept of language and its relation to the structure of frequency dictionaries. Études de linguistique appliquée (Nouvelle Série) 11. 103–127.

Savický, Petr & Jaroslava Hlaváčová

2002 Measures of word commonness. Journal of Quantitative Linguistics 9(3), 215–231.

Schmid, Hans Joerg

2010 Entrenchment, salience, and basic levels. In Dirk Geeraerts & Hubert Cuyckens. (eds.), The Oxford Handbook of Cognitive Linguistics, 117–138. Oxford: Oxford University Press.

Siyanova-Chanturia, Anna

2015 Collocation in beginner learner writing: A longitudinal study. System 531. 148–160.

Spärck Jones, Karen

1972 A statistical interpretation of term specificity and its application in information retrieval. Journal of Documentation 28(1). 11–21.

Spieler, Daniel H. & David A. Balota

1997 Bringing computational models of word naming down to the item level. Psychological Science 8(6). 411–416.

Tucker, Benjamin V., Daniel Brennerm, D. Kyle Danielson, Matthew C. Kelley, Filip Nenadić, & Michelle Sims

2019 The Massive Auditory Lexical Decision (MALD) database. Behavior Research Methods 511. 1187–1204.

Zagorsky, Jay L.

2007 Do you have to be smart to be rich? The impact of IQ on wealth, income and financial distress. Intelligence 35(5). 489–501.

Cited by (5)

Cited by 5 other publications

Order by:

Wulff, Stefanie & Stefan Th. Gries

2024. CLLT ‘versus’ Corpora and IJCL: a (half serious) keyness analysis. Corpus Linguistics and Linguistic Theory 0:0

Jeaco, Stephen

2023. How can we communicate (visually) what we (usually) mean by collocation and keyness?. Journal of Second Language Studies 6:1 ► pp. 29 ff.

Gries, Stefan Th.

2022. Toward more careful corpus statistics: uncertainty estimates for frequencies, dispersions, association measures, and more. Research Methods in Applied Linguistics 1:1 ► pp. 100002 ff.

Th Gries, Stefan

2024. Corrections to Nelson (2023): DP norm and D KLnorm are Not Wrong on Pi at All . Journal of Quantitative Linguistics 31:1 ► pp. 43 ff.

[no author supplied]

2024. Frequency, Dispersion, Association, and Keyness [Studies in Corpus Linguistics, 115],

This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.