This paper discusses the degree to which some of the most widely-used measures of association in corpus
linguistics are not particularly valid in the sense of actually measuring association rather than some amalgam of a lot of
frequency and a little association. The paper demonstrates these issues on the basis of hypothetical and actual corpus data and
outlines implications of the findings. I then outline how to design an association measure that only measures association and show
that its behavior supports the use of the log odds ratio as a true association-only measure but separately from frequency; in
addition, this paper sets the stage for an analogous review of dispersion measures in corpus linguistics.
Baayen, R. Harald, Petar Milin, & Michael Ramscar. 2016. Frequency
in lexical
processing. Aphasiaology 30(11). 1174–1220.
Bestgen, Yves & Sylviane Granger. 2014. Quantifying
the development of phraseological competence in L2 English writing: An automated
approach. Journal of Second Language
Writing 261. 28–41.
Chruch, Kenneth W. & Patrick Hanks. 1993. Word
association norms, mutual information, and lexicography. Computational
Linguistics 16(1). 22–29.
Dunning, Ted. 1993. Accurate
methods for the statistics of surprise and coincidence. Computational
Linguistics 19(1), 61–74.
Durrant, Phil & Norbert Schmitt. 2009. To
what extent do native and non-native writers make use of collocations?Internationak Review of
Applied Linguistics 471. 157–177.
Ellis, Nick C.2007a. Language acquisition as
rational contingency learning. Applied
Linguistics 27(1). 1–24.
Ellis, Nick C.2007b. The Associative-Cognitive
CREED. In Bill VanPatten & Jessica Williams (eds.), Theories
of second language acquisition: an
introduction, 77–95. Mahwah, NJ: Lawrence Erlbaum.
Ellis, Nick C., Rita Simpson-Vlach, & Carson Maynard. 2008. Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly 42(3). 375–396.
Evert, Stefan. 2009. Corpora
and collocations. In Anke Lüdeling & Merja. Kytö (eds.), Corpus
Linguistics: An International
Handbook, Vol. 21, 1212–1248. Berlin & New York: Mouton de Gruyter.
Evert, Stefan & Brigitte Krenn. 2001. Methods for the qualitative evaluation of lexical association measures. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, p, 188–195.
Groom, Nicholas. 2009. Effects
of second language immersion on second language collocational
development. In Andy Barfield & Henrik Gyllstad (eds.), Researching
collocations in another
language, 21–33. Basingstoke, UK: Palgrave Macmillan.
Gries, Stefan Th.2010. Dispersions and adjusted
frequencies in corpora: further explorations. In Stefan Th. Gries, Stefanie Wulff, & Mark Davies (eds.), Corpus
linguistic applications: current studies, new
directions, 197–212. Amsterdam: Rodopi.
Gries, Stefan Th.2019a. Ten lectures on corpus-linguistic
approaches: Applications for usage-based and psycholinguistic research. Leiden & Boston: Brill.
Gries, Stefan Th.2020. Analyzing
dispersion. In Magali Paquot & Stefan Th. Gries (eds.), A
practical handbook of corpus
linguistics, 99–118. Berlin & New York: Springer.
Gries, Stefan Th.2021. A new approach to (key) keywords analysis: using
frequency, and now also dispersion. Research in Corpus
Linguistics 9(2). 1–33.
Hunston, Susan. 2002. Corpora in applied linguistics. Cambridge: Cambridge University Press.
Pecina, Pavel. 2009. Lexical
AMs and collocation extraction. Language Resources and
Evaluation 44(1–2). 137–158.
Savický, Petr & Jaroslava Hlaváčová. 2002. Measures
of word commonness. Journal of Quantitative
Linguistics 9(3), 215–231.
Schmid, Hans Joerg. 2010. Entrenchment, salience, and
basic levels. In Dirk Geeraerts & Hubert Cuyckens (eds.), The
Oxford Handbook of Cognitive
Linguistics, 117–138. Oxford: Oxford University Press.
Siyanova-Chanturia, Anna. 2015. Collocation
in beginner learner writing: A longitudinal
study. System 531. 148–160.
Thanopoulos, Aristomenis, Nikos Fakotakis, & George Kokkinakis. 2002. Comparative
Evaluation of Collocation Extraction Metrics. Paper presented
at LREC 2002.
Cited by (19)
Cited by 19 other publications
Bardenstein, Ruti & Mira Ariel
2024. From location to conjunction, disjunction, partition, exemplification and association: Hebrew bein constructions. Linguistics 62:5 ► pp. 1301 ff.
2024. A radically usage-based, collostructional approach to assessing the differences between negative modal contractions and their parent forms. Corpus Linguistics and Linguistic Theory
Hoang, Hien & Peter Crosthwaite
2024. A comparative analysis of multiword units in the reading and listening input of English textbooks. System 121 ► pp. 103224 ff.
Hougham, Dan, Jon Clenton & Takumi Uchihara
2024. Disentangling the contributions of shorter vs. longer lexical bundles to L2 oral fluency. System 121 ► pp. 103243 ff.
Hougham, Dan, Jon Clenton, Takumi Uchihara & George Higginbotham
2024. The Impact of Lexical Bundle Length on L2 Oral Proficiency. Languages 9:7 ► pp. 232 ff.
LI, Jingjie & Wenjie HU
2024. Identification of Sentence Stems Characteristic of Chinese Learner English Writing. Heliyon► pp. e37166 ff.
Liao, Shengyu, Stefan Th. Gries & Stefanie Wulff
2024. Transfer five ways: applications of multiple distinctive collexeme analysis to the dative alternation in Mandarin Chinese. Corpus Linguistics and Linguistic Theory
2024. Between syntax and morphology: German noun+verb units. Glossa: a journal of general linguistics 9:1
Suethanapornkul, Sakol & Sarut Supasiraprapa
2024. Usage events and constructional knowledge: A study of two variants of the introductory-it construction. Studies in Second Language Acquisition 46:2 ► pp. 355 ff.
Yi, Wei & Yanlu Zhong
2024. The processing advantage of multiword sequences: A meta-analysis. Studies in Second Language Acquisition 46:2 ► pp. 427 ff.
Eguchi, Masaki & Kristopher Kyle
2023. L2 collocation profiles and their relationship with vocabulary proficiency: A learner corpus approach. Journal of Second Language Writing 60 ► pp. 100975 ff.
2022. Methodological considerations for the use of mutual information: Examining the role of context in collocation research. Research Methods in Applied Linguistics 1:3 ► pp. 100024 ff.
Gries, Stefan Th.
2022. Multi-word units (and tokenization more generally): a multi-dimensional and largely information-theoretic approach. Lexis :19
This list is based on CrossRef data as of 22 november 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.