Article published in:
Current Issues in PhraseologyEdited by Sebastian Hoffmann, Bettina Fischer-Starcke and Andrea Sand
[Benjamins Current Topics 74] 2015
► pp. 135–164
50-something years of work on collocations
What is or should be next …
This paper explores ways in which research into collocation should be improved. After a discussion of the parameters underlying the notion of collocation, the paper has three main parts. First, I argue that corpus linguistics would benefit from taking more seriously the understudied fact that collocations are not necessarily symmetric, as most association measures imply. Also, I introduce an association measure from the associative learning literature that can identify asymmetric collocations and show that it can also distinguish collocations with high and low association strengths well. Second, I summarize some advantages of this measure and brainstorm about ways in which it can help re-examine previous studies as well as support further applications. Finally, I adopt a broader perspective and discuss a variety of ways in which all association measures – directional or not – in corpus linguistics should be improved in order for us to obtain better and more reliable results.
Keywords: association measure, collocation, directionality, dispersion, DP (delta P)
Published online: 10 July 2015
https://doi.org/10.1075/bct.74.07gri
https://doi.org/10.1075/bct.74.07gri
References
Baayen, R.H.
Bartsch, S.
Bell, A., Brenier, J.M., Gregory, M., Girand, C. & Jurafsky, D.
Daudaravičius, V. & Marcinkevičienė, R.
Ellis, N.C.
Ellis, N.C. & Ferreira-Junior, F.
Evert, S.
Ferraresi, A. & Gries, St. Th
2011 “Type and (?) token frequencies in measures of collocational strength: Lexical gravity vs. a few classics”. Paper presented at
Corpus Linguistics 2011
,
University of Birmingham, UK
.
Firth, J.R.
Gries, St. Th
2010b: online. “Bigrams in registers, domains, and varieties: A bigram gravity approach to the homogeneity of corpora”. InM. Mahlberg, V. González-Diaz & C. Smith(Eds.), Proceedings of the Corpus Linguistics Conference (CL 2009),
University of Liverpool, UK
, 20–23 July 2009.Available at: http://ucrel.lancs.ac.uk/publications/cl2009 (accessedJuly 2012).
Gries, St. Th., Hampe, B. & Schönefeld, D.
Handl, S.
Jelinek, F.
Kilgarriff, A.
2009 “Simple maths for keywords”. Paper presented at
Corpus Linguistics 2009
,
University of Liverpool
.
Kjellmer, G.
McGee, I.
Michelbacher, L., Evert, S. & Schütze, H.
2007 “Asymmetric association measures”. Paper presented at the
6th International Conference on Recent Advances in Natural Language Processing
,
Borovets, Bulgaria
.
Mollin, S.
Nordquist, D.
Pecina, P.
Pedersen, T.
1998 “Dependent bigram identification”. In
Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98)
, July 28–30, 1197.
R Development Core Team
2012: online. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Available at: http://www.R-project.org (accessedJuly 2012).
Raymond, W.D. & Brown, E.L.
Smadja, F.
Stubbs, M.
Wahl, A.R.
2011 “Intonation unit boundaries and the entrenchment of collocations: Evidence from bidirectional and directional association measures”. Unpublished ms, Department of Linguistics, University of California, Santa Barbara.
Wiechmann, D.