This paper explores ways in which research into collocation should be improved. After a discussion of the parameters underlying the notion of collocation, the paper has three main parts. First, I argue that corpus linguistics would benefit from taking more seriously the understudied fact that collocations are not necessarily symmetric, as most association measures imply. Also, I introduce an association measure from the associative learning literature that can identify asymmetric collocations and show that it can also distinguish collocations with high and low association strengths well. Second, I summarize some advantages of this measure and brainstorm about ways in which it can help re-examine previous studies as well as support further applications. Finally, I adopt a broader perspective and discuss a variety of ways in which all association measures – directional or not – in corpus linguistics should be improved in order for us to obtain better and more reliable results.
2004Structural and Functional Properties of Collocations in English: A Corpus Study of Lexical and Pragmatic Constraints on Lexical Co-occurrence. Tübingen: Gunter Narr.
Bell, A., Brenier, J.M., Gregory, M., Girand, C. & Jurafsky, D
2009 “Predictability effects on durations of content and function words in conversational English”. Journal of Memory and Language, 60 (1), 92–111.
2005The Statistics of Word Co-occurrences: Word Pairs and Collocations. Ph.D. thesis. Stuttgart: University of Stuttgart.
Evert, S
2009 “Corpora and collocations”. InA. Lüdeling & M. Kytö(Eds.), Corpus Linguistics: An International Handbook, Vol. 2. Berlin/New York: Mouton de Gruyter, 1212–1248.
Ferraresi, A. & Gries, St. Th
2011 “Type and (?) token frequencies in measures of collocational strength: Lexical gravity vs. a few classics”. Paper presented at
Corpus Linguistics 2011
,
University of Birmingham, UK
.
Firth, J.R
1957 “A synopsis of linguistic theory 1930–1955”. InF. Palmer(Ed.), Selected Papers of J. R. Firth 1952–1959. London: Longman, 168–205.
Gries, St. Th
2001 “A corpus-linguistic analysis of -ic and -ical adjectives”. ICAME Journal, 25, 65–108.
2010a “Dispersions and adjusted frequencies in corpora: Further explorations”. InS. Th. Gries, S. Wulff & M. Davies(Eds.), Corpus Linguistic Applications: Current Studies, New Directions. Amsterdam: Rodopi, 197–212.
Gries, St. Th
2010b: online. “Bigrams in registers, domains, and varieties: A bigram gravity approach to the homogeneity of corpora”. InM. Mahlberg, V. González-Diaz & C. Smith(Eds.), Proceedings of the Corpus Linguistics Conference (CL 2009),
University of Liverpool, UK
, 20–23 July 2009.Available at: [URL] (accessedJuly 2012).
Gries, St. Th
2012 “Corpus linguistics, theoretical linguistics, and cognitive/psycholinguistics: Towards more and more fruitful exchanges”. InJ. Mukherjee & M. Huber(Eds.), Corpus Linguistics and Variation in English: Theory and Description. Amsterdam: Rodopi, 41–63.
Gries, St. Th., Hampe, B. & Schönefeld, D
2005 “Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions”. Cognitive Linguistics, 16 (4), 635–676.
1990 “Self-organized language modeling for speech recognition”. InA. Waibel & K.-F. Lee(Eds.), Readings in Speech Recognition. San Mateo, CA: Morgan Kaufmann, 450–506.
Kilgarriff, A
2009 “Simple maths for keywords”. Paper presented at
Corpus Linguistics 2009
,
University of Liverpool
.
Kjellmer, G
1991 “A mint of phrases”. InK. Aijmer & B. Altenberg(Eds.), English Corpus Linguistics: Studies in Honor of Jan Svartvik. London: Longman, 111–127.
McGee, I
2009 “Adjective-noun collocations in elicited and corpus data: Similarities, differences, and the whys and wherefores”. Corpus Linguistics and Linguistic Theory, 5 (1), 79–103.
Michelbacher, L., Evert, S. & Schütze, H
2007 “Asymmetric association measures”. Paper presented at the
6th International Conference on Recent Advances in Natural Language Processing
,
Borovets, Bulgaria
.
Michelbacher, L., Evert, S. & Schütze, H
2011 “Asymmetry in corpus-derived and human word associations”. Corpus Linguistics and Linguistic Theory, 7 (2), 245–276.
Mollin, S
2009 “Combining corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word associations”. Corpus Linguistics and Linguistic Theory, 5 (2), 175–200.
Nordquist, D
2009 “Investigating elicited data from a usage-based perspective”. Corpus Linguistics and Linguistic Theory, 5 (1), 105–130.
Pecina, P
2009 “Lexical association measures and collocation extraction”. Language Resources and Evaluation, 44 (1–2), 137–158.
Pedersen, T
1998 “Dependent bigram identification”. In
Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98)
, July 28–30, 1197.
R Development Core Team
2012: online. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Available at: [URL] (accessedJuly 2012).
Raymond, W.D. & Brown, E.L
2012 “Are effects of word frequency effects of context of use? An analysis of initial fricative reduction in Spanish”. InSt. Th. Gries & D.S. Divjak(Eds.), Frequency Effects in Language Learning and Processing. Berlin/New York: Mouton de Gruyter, 35–52.
Shanks, D.R
1995The Psychology of Associative Learning. New York: Cambridge University Press.
2001Words and Phrases: Corpus Studies of Lexical Semantics. Oxford/Malden, MA: Blackwell.
Tversky, A
1977 “Features of similarity”. Psychological Review, 84 (4), 327–352.
Wahl, A.R
2011 “Intonation unit boundaries and the entrenchment of collocations: Evidence from bidirectional and directional association measures”. Unpublished ms, Department of Linguistics, University of California, Santa Barbara.
Wiechmann, D
2008 “On the computation of collostruction strength: Testing measures of association as expressions of lexical bias”. Corpus Linguistics and Linguistic Theory, 4 (2), 253–290.
Zhang, W., Yoshida, T., Tang, X. & Ho, T.-B
2009 “Improving effectiveness of mutual information for substantival multiword expression extraction”. Expert Systems with Applications, 36 (8), 10919–10930.
Cited by
Cited by 4 other publications
Rastelli, Stefano
2022. Intra-language: the study of L2 morpheme productivity as within-item variance. International Review of Applied Linguistics in Language Teaching 60:4 ► pp. 1143 ff.
Rastelli, Stefano
2022. Intra-language: the study of L2 morpheme productivity as within-item variance. International Review of Applied Linguistics in Language Teaching 60:4 ► pp. 1143 ff.
Rastelli, Stefano & Akira Murakami
2022. Apparently identical verbs can be represented differently: comparing L1–L2 inflection with contingency-based measure ΔP. Corpora 17:1 ► pp. 97 ff.
Smith, Chris A.
2018. Diachronic patterns of usage of no doubt in the English Historical Book Collection (EEBO, ECCO and EVANS). ExELL 6:1 ► pp. 1 ff.
This list is based on CrossRef data as of 19 march 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.