Ch. 4 | Exercise 4

Chapter 4 | Exercise 4

The nouns shaving and descriptor have very similar frequencies in the Corpus of Contemporary American English (COCA), namely, 513 and 515. Do you think these words are equally entrenched? Measure their dispersion with the help of DPs on the basis of the frequencies in Table 4.1.

Table 4.1. Frequencies of shaving and descriptor in COCA
  Spoken Fiction Academic Press
shaving 25 175 40 273
descriptor 6 7 462 40
a.

Create two vectors, shaving and descriptor, with the counts displayed in Table 4.1.

b.

Compute the DPs and normalized DPs using the expected proportions exp_prop from the case study of colour terms in Section 4.3. Do the results meet your expectations?

Create two vectors with the corpus counts:

> shaving <- c(25, 175, 40, 273) > descriptor <- c(6, 7, 462, 40)

To compute the DPs and normalized DPs, create the vectors with proportions of each word in the four registers:

> shaving_obs <- prop.table(shaving) > shaving_obs [1] 0.04873294 0.34113060 0.07797271 0.53216374 > descriptor_obs <- prop.table(descriptor) > descriptor_obs [1] 0.01165049 0.01359223 0.89708738 0.07766990

Next, compute simple DPs (see Section 4.3 on how to compute exp_prop):

> DP_shaving <- sum(abs(shaving_obs - exp_prop))/2 > DP_shaving [1] 0.2750666 > DP_descriptor <- sum(abs(descriptor_obs - exp_prop))/2 > DP_descriptor [1] 0.7008788

Finally, perform normalization:

> DP_shaving_norm <- DP_shaving/(1 - min(exp_prop)) > DP_shaving_norm [1] 0.3415698 > DP_descriptor_norm <- DP_descriptor/(1 - min(exp_prop)) > DP_descriptor_norm [1] 0.8703311

The noun descriptor has a much higher DP score than shaving, which means that it is less evenly distributed in different registers. Therefore, this word is less entrenched, in spite of the fact that the nouns have nearly equal corpus frequencies.