Ch. 5 | Exercise 1

Chapter 5 | Exercise 1

Consider the imagery scores of the high- and low-frequency nouns, available as the imag variable in the data frames pym_high and pym_low. Which group of nouns would you expect to have higher imagery scores? Compute the means and medians. Use an appropriate parametric and non-parametric test to check if the difference between the groups is statistically significant. Are the results of the tests similar?

Key

In this exercise we will test a non-directional hypothesis of difference between the groups. First, load the package and the data. Next, compute the means and the medians:

> library(Rling)
> data(pym_high)
> data(pym_low)
> mean(pym_high$imag)
[1] 5.1706
> mean(pym_low$imag)
[1] 4.884902
> median(pym_high$imag)
[1] 5.05
> median(pym_low$imag)
[1] 5.23

The high-frequency nouns have higher means but lower medians. According to the parametric t-test, the differences between the means are not statistically significant:

> t.test(pym_high$imag, pym_low$imag)

        Welch Two Sample t-test

data:  pym_high$imag and pym_low$imag
t = 1.0694, df = 98.463, p-value = 0.2875
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2444499  0.8158460
sample estimates:
mean of x mean of y 
 5.170600  4.884902

According to the non-parametric Wilcoxon test, there is no statistically significant difference between the groups, either:

> wilcox.test(pym_high$imag, pym_low$imag, correct = FALSE, conf.int = TRUE)

        Wilcoxon rank sum test

data:  pym_high$imag and pym_low$imag
W = 1441.5, p-value = 0.258
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.2000116  0.7400222
sample estimates:
difference in location 
             0.2000102

Note that both tests are unpaired (independent) and two-tailed.