Edited by Michael P. Oakes and Meng Ji
[Studies in Corpus Linguistics 51] 2012
► pp. 115–148
There are a number of different ways to describe a single corpus. We consider how the frequencies of linguistic features may be quantified, such as in terms of their “average” occurrence, dispersion among text segments, and whether they follow the familiar “bell curve” characteristic of a normal distribution. We describe how to determine the required corpus size so that these things can be measured with the required degree of confidence. We consider “aboutness”: the extent to which individual linguistic features characterise the corpus as a whole. We describe the vocabulary richness, the extent to which the author of a text constantly brings in new vocabulary, and collocations: groups of words which are found together more often than one would expect by chance.
This list is based on CrossRef data as of 19 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.