Chapter 3 | Exercise 1
Table 3.1 shows fictitious data with reaction times of ten subjects in a lexical decision task.
Subj 1 | Subj 2 | Subj 3 | Subj 4 | Subj 5 | Subj 6 | Subj 7 | Subj 8 | Subj 9 | Subj 10 | |
---|---|---|---|---|---|---|---|---|---|---|
Reaction time, ms | 583 | 667 | 1149 | 827 | 488 | 452 | NA | 739 | 455 | 572 |
Create a vector with the reaction times in R.
Compute the mean and the median. Attention: the data contain a missing value.
Compute the main dispersion statistics.
Find the outliers according to the IQR criterion (boxplot), z-scores and MAD-scores. Are the results the same?
Change the value of the outlier. Use the value of the mean plus two z-scores.
> rt <- c(583, 667, 1149, 827, 488, 452, NA, 739, 455, 472)
> mean(rt, na.rm = TRUE)
[1] 648
> median(rt, na.rm = TRUE)
[1] 583
> range(rt, na.rm = TRUE)
[1] 452 1149
> 1149 - 452
[1] 697 #range
> var(rt, na.rm = TRUE)
[1] 53518.75
> sd(rt, na.rm = TRUE)
[1] 231.3412
> IQR(rt, na.rm = TRUE)
[1] 267
> mad(rt, constant = 1, na.rm = TRUE)
[1] 128
> boxplot(rt)
> boxplot.stats(rt)$out
[1] 1149 # outlier according to the 1.5*IQR criterion
> library(Rling)
> normalize(rt)
[1] -0.28097027 0.08212977 2.16563242 0.77374891 -0.69161914 -0.84723344
[7] NA 0.39335838 -0.83426558 -0.76078105
> normalize(rt, method = "mad")
[1] 0.0000000 0.4426346 2.9825138 1.2857480 -0.5005986 -0.6902991
[7] NA 0.8220356 -0.6744908 -0.5849100
The outlier is the third reaction time, according to the IQR criterion. Its z-score is 2.17 and its MAD-score is almost 3.
First, compute the required score:
> mean(rt, na.rm = TRUE) + 2*sd(rt, na.rm = TRUE)
[1] 1110.682
Second, replace the outlying score (ID = 3) with the new score:
> rt_new <- rt
> rt_new[3] <- 1110.682
> rt_new
[1] 583.0000 667.0000 1110.682 827.0000 488.0000 452.0000 NA 739.0000
[9] 455.0000 472.0000