Chapter 4 | Exercise 4
The nouns shaving and descriptor have very similar frequencies in the Corpus of Contemporary American English (COCA), namely, 513 and 515. Do you think these words are equally entrenched? Measure their dispersion with the help of DPs on the basis of the frequencies in Table 4.1.
Spoken | Fiction | Academic | Press | |
---|---|---|---|---|
shaving | 25 | 175 | 40 | 273 |
descriptor | 6 | 7 | 462 | 40 |
Create two vectors, shaving
and descriptor
, with the counts displayed in Table 4.1.
Compute the DPs and normalized DPs using the expected proportions exp_prop
from the case study of colour terms in Section 4.3. Do the results meet your expectations?
Create two vectors with the corpus counts:
> shaving <- c(25, 175, 40, 273)
> descriptor <- c(6, 7, 462, 40)
To compute the DPs and normalized DPs, create the vectors with proportions of each word in the four registers:
> shaving_obs <- prop.table(shaving)
> shaving_obs
[1] 0.04873294 0.34113060 0.07797271 0.53216374
> descriptor_obs <- prop.table(descriptor)
> descriptor_obs
[1] 0.01165049 0.01359223 0.89708738 0.07766990
Next, compute simple DPs (see Section 4.3 on how to compute exp_prop):
> DP_shaving <- sum(abs(shaving_obs - exp_prop))/2
> DP_shaving
[1] 0.2750666
> DP_descriptor <- sum(abs(descriptor_obs - exp_prop))/2
> DP_descriptor
[1] 0.7008788
Finally, perform normalization:
> DP_shaving_norm <- DP_shaving/(1 - min(exp_prop))
> DP_shaving_norm
[1] 0.3415698
> DP_descriptor_norm <- DP_descriptor/(1 - min(exp_prop))
> DP_descriptor_norm
[1] 0.8703311
The noun descriptor has a much higher DP score than shaving, which means that it is less evenly distributed in different registers. Therefore, this word is less entrenched, in spite of the fact that the nouns have nearly equal corpus frequencies.