Ch. 4 | Exercise 4

Chapter 4 | Exercise 4

The nouns shaving and descriptor have very similar frequencies in the Corpus of Contemporary American English (COCA), namely, 513 and 515. Do you think these words are equally entrenched? Measure their dispersion with the help of DPs on the basis of the frequencies in Table 4.1.

Table 4.1. Frequencies of shaving and descriptor in COCA

	Spoken	Fiction	Academic	Press
shaving	25	175	40	273
descriptor	6	7	462	40

Create two vectors, shaving and descriptor, with the counts displayed in Table 4.1.

Compute the DPs and normalized DPs using the expected proportions exp_prop from the case study of colour terms in Section 4.3. Do the results meet your expectations?

Key

Create two vectors with the corpus counts:

> shaving <- c(25, 175, 40, 273)
> descriptor <- c(6, 7, 462, 40)

To compute the DPs and normalized DPs, create the vectors with proportions of each word in the four registers:

 > shaving_obs <- prop.table(shaving)
> shaving_obs
[1] 0.04873294 0.34113060 0.07797271 0.53216374
> descriptor_obs <- prop.table(descriptor)
> descriptor_obs
[1] 0.01165049 0.01359223 0.89708738 0.07766990

Next, compute simple DPs (see Section 4.3 on how to compute exp_prop):

> DP_shaving <- sum(abs(shaving_obs - exp_prop))/2
> DP_shaving
[1] 0.2750666
> DP_descriptor <- sum(abs(descriptor_obs - exp_prop))/2
> DP_descriptor
[1] 0.7008788

Finally, perform normalization:

> DP_shaving_norm <- DP_shaving/(1 - min(exp_prop))
> DP_shaving_norm
[1] 0.3415698
> DP_descriptor_norm <- DP_descriptor/(1 - min(exp_prop))
> DP_descriptor_norm
[1] 0.8703311

The noun descriptor has a much higher DP score than shaving, which means that it is less evenly distributed in different registers. Therefore, this word is less entrenched, in spite of the fact that the nouns have nearly equal corpus frequencies.