Chapter published in:
Language and Text: Data, models, information and applicationsEdited by Adam Pawłowski, Jan Mačutek, Sheila Embleton and George Mikros
[Current Issues in Linguistic Theory 356] 2021
► pp. 21–36
Term distance, frequency and collocations
Lars G. Johnsen | National Library of Norway
In this paper I study two co-occurrence measures, local to a particular corpus, for constructing collocations or relevance relations between words or terms. One is a distance measure, while the other uses different co-occurrence windows, one contained in the other. Both are discussed with respect to the common method of comparing co-occurrence measures within a particular corpus to those of a reference corpus. A practical consequence of these measures is that they may relieve the burden of computing a reference statistic, which may incur a high computational cost. We also believe that distance, as a measure in itself, has a theoretical interest. Being different from frequency, it may add something new to collocation analysis.
Keywords: collocation, term distance, frequency, Bayes, probability, concordance
Article outline
- 1.Introduction
- 2.Δ-score and Pointwise Mutual Information
- 3.Data and technical method
- 4.Collocations
- 4.1Frequency and context enlargement
- 4.2Distance
- 4.2.1The verb
- 4.2.2The noun
- 5.Discussion
-
Notes -
References
Published online: 22 December 2021
https://doi.org/10.1075/cilt.356.02joh
https://doi.org/10.1075/cilt.356.02joh
References
Barnbrook, Geoff, Oliver Mason & Ramesh Krishnamurthy
Birkenes, Magnus Breder, Lars G. Johnsen, Arne M. Lindstad & Johanne Ostad
Blondel, Vincent D., Jean-Loup Guillaume, Renaud Lambiotte & Etienne Lefebvre
Church, Kenneth Ward & Patrick Hanks
Firth, J. R.
Halliday, Mark
Jaynes, Edwin. T.
Johnsen, Lars G. B.
Kolesnikova, Olga