What do (most of) our dispersion measures measure (most)? Dispersion?
Stefan Th. Gries | University of California, Santa Barbara, USA | JLU Giessen
This paper discusses the degree to which most of the most widely-used measures of dispersion in corpus linguistics
are not particularly valid in the sense of actually measuring dispersion rather than some amalgam of a lot of frequency and a
little dispersion. The paper demonstrates these issues on the basis of data from a variety of corpora. I then outline how to
design a dispersion measure that only measures dispersion and show that (i) it indeed measures information that is different from
frequency in an intuitive way and (ii) has a higher degree of predictive power of lexical decision times from the MALD database
than nearly all other measures in nearly all corpora tested.
2016Frequency
in lexical
processing. Aphasiaology 30(11). 1174–1220.
Balota, David A. & Daniel H. Spieler
1998The
utility of item level analyses in model evaluation: a reply to Seidenberg and
Plaut. Psychological
Science 9(3). 238–240.
Bestgen, Yves & Sylviane Granger
2009Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing 261. 28–41.
Brysbaert, Marc & Boris New
2009Moving
beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved
word frequency measure for American English. Behavior Research
Methods 41(4). 977–990.
Brysbaert, Marc, Pawel Mandera, Samantha F. McCormick, & Emmanuel Keuleers
2019Word prevalence norms for 62,000 English lemmas. Behavior Research Methods 511. 467–479.
Carroll, John B.
1970An alternative to Juilland’s
usage coefficient for lexical frequencies and a proposal for a standard frequency
index. Computer Studies in the Humanities and Verbal
Behaviour 3(2). 61–65.
Durrant, Phil & Norbert Schmitt
2009To what extent do native and non-native writers make use of collocations?International Review of Applied Linguistics 471. 157–177.
Ellis, Nick C.
2007aLanguage acquisition as
rational contingency learning. Applied
Linguistics 27(1). 1–24.
Ellis, Nick C.
2007bThe Associative-Cognitive
CREED. In Bill VanPatten & Jessica Williams. (eds.), Theories
of second language acquisition: an
introduction, 77–95. Mahwah, NJ: Lawrence Erlbaum.
Ellis, Nick C., Rita Simpson-Vlach, & Carson Maynard
2008Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly 42(3). 375–396.
Evert, Stefan
2009Corpora
and collocations. In Anke Lüdeling & Merja. Kytö. (eds.), Corpus
Linguistics: An International
Handbook, Vol. 21, 1212–1248. Berlin & New York: Mouton de Gruyter.
2010Dispersions and adjusted
frequencies in corpora: further explorations. In Stefan Th. Gries, Stefanie Wulff, & Mark Davies. (eds.), Corpus
linguistic applications: current studies, new
directions, 197–212. Amsterdam: Rodopi.
Gries, Stefan Th.
2019aTen lectures on corpus-linguistic
approaches: Applications for usage-based and psycholinguistic research. Leiden & Boston: Brill.
2020Analyzing
dispersion. In Magali Paquot & Stefan Th. Gries. (eds.), A
practical handbook of corpus
linguistics, 99–118. Berlin & New York: Springer.
Juilland, Alphonse G., Dorothy R. Brodin, & Catherine Davidovitch
1970Frequency
dictionary of French words. The Hague: Mouton de Gruyter.
Kromer, Victor
2003An
usage measure based on psychophysical relations. Journal of Quantitative
Linguistics 10(2). 177–186.
Oakes, Michael P. & Malcolm Farrow
2007Use
of the Chi-Squared Test to examine vocabulary differences in English language corpora representing seven different
countries. Literary and Linguistic
Computing 22(1). 85–99.
Pecina, Pavel
2009Lexical
association measures and collocation extraction. Language Resources and
Evaluation 44(1–2). 137–158.
Robertson, Stephen
2004Understanding
Inverse Document Frequency: on theoretical arguments of IDF. Journal of
Documentation 60(5). 503–520.
Rosengren, Inger
1971The
quantitative concept of language and its relation to the structure of frequency
dictionaries. Études de linguistique appliquée (Nouvelle
Série) 11. 103–127.
Savický, Petr & Jaroslava Hlaváčová
2002Measures
of word commonness. Journal of Quantitative
Linguistics 9(3), 215–231.
Schmid, Hans Joerg
2010Entrenchment, salience, and
basic levels. In Dirk Geeraerts & Hubert Cuyckens. (eds.), The
Oxford Handbook of Cognitive
Linguistics, 117–138. Oxford: Oxford University Press.
Siyanova-Chanturia, Anna
2015Collocation in beginner learner writing: A longitudinal study. System 531. 148–160.
Spärck Jones, Karen
1972A
statistical interpretation of term specificity and its application in information
retrieval. Journal of
Documentation 28(1). 11–21.
Spieler, Daniel H. & David A. Balota
1997Bringing
computational models of word naming down to the item level. Psychological
Science 8(6). 411–416.
Tucker, Benjamin V., Daniel Brennerm, D. Kyle Danielson, Matthew C. Kelley, Filip Nenadić, & Michelle Sims
2022. Toward more careful corpus statistics: uncertainty estimates for frequencies, dispersions, association measures, and more. Research Methods in Applied Linguistics 1:1 ► pp. 100002 ff.
Th Gries, Stefan
2024.
Corrections to Nelson (2023):
DP
norm
and
D
KLnorm
are Not Wrong on Pi at All
. Journal of Quantitative Linguistics 31:1 ► pp. 43 ff.
This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.