Ch. 3 | Exercise 1

Chapter 3 | Exercise 1

Table 3.1 shows fictitious data with reaction times of ten subjects in a lexical decision task.

Table 3.1. Fictitious data with reaction times of ten subjects
  Subj 1 Subj 2 Subj 3 Subj 4 Subj 5 Subj 6 Subj 7 Subj 8 Subj 9 Subj 10
Reaction time, ms 583 667 1149 827 488 452 NA 739 455 572
1.

Create a vector with the reaction times in R.

2.

Compute the mean and the median. Attention: the data contain a missing value.

3.

Compute the main dispersion statistics.

4.

Find the outliers according to the IQR criterion (boxplot), z-scores and MAD-scores. Are the results the same?

5.

Change the value of the outlier. Use the value of the mean plus two z-scores.

> rt <- c(583, 667, 1149, 827, 488, 452, NA, 739, 455, 472) > mean(rt, na.rm = TRUE) [1] 648 > median(rt, na.rm = TRUE) [1] 583 > range(rt, na.rm = TRUE) [1] 452 1149 > 1149 - 452 [1] 697 #range > var(rt, na.rm = TRUE) [1] 53518.75 > sd(rt, na.rm = TRUE) [1] 231.3412 > IQR(rt, na.rm = TRUE) [1] 267 > mad(rt, constant = 1, na.rm = TRUE) [1] 128 > boxplot(rt) > boxplot.stats(rt)$out [1] 1149 # outlier according to the 1.5*IQR criterion > library(Rling) > normalize(rt) [1] -0.28097027 0.08212977 2.16563242 0.77374891 -0.69161914 -0.84723344 [7] NA 0.39335838 -0.83426558 -0.76078105 > normalize(rt, method = "mad") [1] 0.0000000 0.4426346 2.9825138 1.2857480 -0.5005986 -0.6902991 [7] NA 0.8220356 -0.6744908 -0.5849100

The outlier is the third reaction time, according to the IQR criterion. Its z-score is 2.17 and its MAD-score is almost 3.

First, compute the required score:

> mean(rt, na.rm = TRUE) + 2*sd(rt, na.rm = TRUE) [1] 1110.682

Second, replace the outlying score (ID = 3) with the new score:

> rt_new <- rt > rt_new[3] <- 1110.682 > rt_new [1] 583.0000 667.0000 1110.682 827.0000 488.0000 452.0000 NA 739.0000 [9] 455.0000 472.0000