Ch. 5 | Exercise 1

Chapter 5 | Exercise 1

Consider the imagery scores of the high- and low-frequency nouns, available as the imag variable in the data frames pym_high and pym_low. Which group of nouns would you expect to have higher imagery scores? Compute the means and medians. Use an appropriate parametric and non-parametric test to check if the difference between the groups is statistically significant. Are the results of the tests similar?

In this exercise we will test a non-directional hypothesis of difference between the groups. First, load the package and the data. Next, compute the means and the medians:

> library(Rling) > data(pym_high) > data(pym_low) > mean(pym_high$imag) [1] 5.1706 > mean(pym_low$imag) [1] 4.884902 > median(pym_high$imag) [1] 5.05 > median(pym_low$imag) [1] 5.23

The high-frequency nouns have higher means but lower medians. According to the parametric t-test, the differences between the means are not statistically significant:

> t.test(pym_high$imag, pym_low$imag) Welch Two Sample t-test data: pym_high$imag and pym_low$imag t = 1.0694, df = 98.463, p-value = 0.2875 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.2444499 0.8158460 sample estimates: mean of x mean of y 5.170600 4.884902

According to the non-parametric Wilcoxon test, there is no statistically significant difference between the groups, either:

> wilcox.test(pym_high$imag, pym_low$imag, correct = FALSE, conf.int = TRUE) Wilcoxon rank sum test data: pym_high$imag and pym_low$imag W = 1441.5, p-value = 0.258 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: -0.2000116 0.7400222 sample estimates: difference in location 0.2000102

Note that both tests are unpaired (independent) and two-tailed.