Ch. 10 | Exercise 4

Chapter 10 | Exercise 4

Using the frequencies in Exercise 3, compute the log-likelihood scores for each collexeme. Which adjective has the highest log-likelihood score, and which has the lowest one? The total number of words in COCA at the moment of query was 464 020 256.

Key

Create the vectors with the frequencies b, c, d and expected a:

> b <- total - a
> c <- 28636 - a
> d <- 464020256 - a - b - c
> aExp <- (a + b)*(a + c)/(a + b + c + d)

Compute the log-likelihood ratio and add the information about the direction of the relationship:

> library(Rling)
> loglik <- LL.collostr(a, b, c, d)
> loglik1 <- ifelse(a<aExp, -loglik, loglik)
> names(loglik1) <- adj
> sort(loglik1, decreasing = T)
       crazy        wrong      haywire        blank   unpunished 
22406.529538  7498.978088  4056.346901  3560.834797  3214.194939 
  undetected   stir-crazy        batty     hog-wild         sick 
 3060.021742   231.245176   210.705804   207.630165     4.910525

The highest score belongs to crazy, and the lowest to sick.