Ch. 4 | Exercise 4

# Chapter 4 | Exercise 4

The nouns shaving and descriptor have very similar frequencies in the Corpus of Contemporary American English (COCA), namely, 513 and 515. Do you think these words are equally entrenched? Measure their dispersion with the help of DPs on the basis of the frequencies in Table 4.1.

Table 4.1. Frequencies of shaving and descriptor in COCA
shaving 25 175 40 273
descriptor 6 7 462 40
a.

Create two vectors, `shaving` and `descriptor`, with the counts displayed in Table 4.1.

b.

Compute the DPs and normalized DPs using the expected proportions `exp_prop` from the case study of colour terms in Section 4.3. Do the results meet your expectations?

Create two vectors with the corpus counts:

```> shaving <- c(25, 175, 40, 273) > descriptor <- c(6, 7, 462, 40) ```

To compute the DPs and normalized DPs, create the vectors with proportions of each word in the four registers:

``` > shaving_obs <- prop.table(shaving) > shaving_obs [1] 0.04873294 0.34113060 0.07797271 0.53216374 > descriptor_obs <- prop.table(descriptor) > descriptor_obs [1] 0.01165049 0.01359223 0.89708738 0.07766990 ```

Next, compute simple DPs (see Section 4.3 on how to compute exp_prop):

```> DP_shaving <- sum(abs(shaving_obs - exp_prop))/2 > DP_shaving [1] 0.2750666 > DP_descriptor <- sum(abs(descriptor_obs - exp_prop))/2 > DP_descriptor [1] 0.7008788 ```

Finally, perform normalization:

```> DP_shaving_norm <- DP_shaving/(1 - min(exp_prop)) > DP_shaving_norm [1] 0.3415698 > DP_descriptor_norm <- DP_descriptor/(1 - min(exp_prop)) > DP_descriptor_norm [1] 0.8703311 ```

The noun descriptor has a much higher DP score than shaving, which means that it is less evenly distributed in different registers. Therefore, this word is less entrenched, in spite of the fact that the nouns have nearly equal corpus frequencies.