Chapter 5 | Exercise 1
Consider the imagery scores of the high- and low-frequency nouns, available as the imag variable in the data frames pym_high
and pym_low
. Which group of nouns would you expect to have higher imagery scores? Compute the means and medians. Use an appropriate parametric and non-parametric test to check if the difference between the groups is statistically significant. Are the results of the tests similar?
In this exercise we will test a non-directional hypothesis of difference between the groups. First, load the package and the data. Next, compute the means and the medians:
> library(Rling)
> data(pym_high)
> data(pym_low)
> mean(pym_high$imag)
[1] 5.1706
> mean(pym_low$imag)
[1] 4.884902
> median(pym_high$imag)
[1] 5.05
> median(pym_low$imag)
[1] 5.23
The high-frequency nouns have higher means but lower medians. According to the parametric t-test, the differences between the means are not statistically significant:
> t.test(pym_high$imag, pym_low$imag)
Welch Two Sample t-test
data: pym_high$imag and pym_low$imag
t = 1.0694, df = 98.463, p-value = 0.2875
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2444499 0.8158460
sample estimates:
mean of x mean of y
5.170600 4.884902
According to the non-parametric Wilcoxon test, there is no statistically significant difference between the groups, either:
> wilcox.test(pym_high$imag, pym_low$imag, correct = FALSE, conf.int = TRUE)
Wilcoxon rank sum test
data: pym_high$imag and pym_low$imag
W = 1441.5, p-value = 0.258
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
-0.2000116 0.7400222
sample estimates:
difference in location
0.2000102
Note that both tests are unpaired (independent) and two-tailed.