Chapter 7 | Exercise 1
For this case study, you will need the data set
ldt from the
Rling package, which was discussed in Chapters 3 and 6.
Fit a linear regression model with the mean reaction times as the dependent variable and log-transformed corpus word frequency Freq (add 1 to avoid -Inf) and word length Length as predictors. Is the model significant? What is the predictive power of the model? What is the effect of the predictors on the response? Are these effects statistically significant?
Compute the 95% and 99% confidence intervals of the regression estimates in the model.
According to the fitted model, how much time on average would it take to recognize a word with 5 letters and the corpus frequency of 100? Compute manually the fitted value.
Which variables survive backward, forward and bidirectional stepwise selection?
Check if the linearity assumption is met with the help of the component-residual plot.
Check if the residuals are distributed homoscedastically.
Test the model for multicollinearity between the predictors. Is it acceptable?
Are the residuals distributed normally?
Find two dangerous outliers and fit a new model without them. Can you see the difference in the new model? What about the distribution of residuals and heteroscedasticity?
Check if the new model overfits the data by using 200 bootstrap samples.
Test whether there is significant interaction between the explanatory variables in the new model.
Load the data and fit a linear regression model:
> library(Rling) > data(ldt) > m <- lm(Mean_RT ~ Length + log1p(Freq), data = ldt) > summary(m) Call: lm(formula = Mean_RT ~ Length + log1p(Freq), data = ldt) Residuals: Min 1Q Median 3Q Max -237.14 -72.58 -13.03 46.35 565.58 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 714.282 63.473 11.253 < 2e-16 *** Length 26.132 5.237 4.990 2.66e-06 *** log1p(Freq) -21.313 4.971 -4.287 4.27e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 111.9 on 97 degrees of freedom Multiple R-squared: 0.477, Adjusted R-squared: 0.4662 F-statistic: 44.24 on 2 and 97 DF, p-value: 2.22e-14
The p-value based on the F-statistic is very small, which means that the model in general is significant. The R2 statistic is 0.477, which suggests that the model has some explanatory power, although probably not all relevant factors are taken into account. The estimated coefficient of Length is 26.132. This means that with every additional letter of a stimulus, the reaction time increases by 26.132 ms. The coefficient of log-transformed Freq is -21.313. This means that with every unit of log-transformed frequency plus 1, the reaction time decreases by 21.313 ms. These effects are statistically significant.
Compute the 95% confidence intervals (the default):
> confint(m) 2.5 % 97.5 % (Intercept) 588.30431 840.25871 Length 15.73760 36.52571 log1p(Freq) -31.17832 -11.44674
Compute the 99% confidence intervals:
> confint(m, level = 0.99) 0.5 % 99.5 % (Intercept) 547.50710 881.055913 Length 12.37153 39.891786 log1p(Freq) -34.37332 -8.251747
The fitted value is as follows:
> 714.28 + 5*26.13 - 21.31*(log1p(100))  746.5818
Both variables survive all three variable selection procedures. The code is as follows:
> m0 <- lm(Mean_RT ~ 1, data = ldt) > step(m0, scope = ~ Length + log1p(Freq), direction = "forward") # forward selection > step(m, direction = "backward") # backward selection > step(m0, scope = ~ Length + log1p(Freq)) # bidirectional selection
All three methods converge: both explanatory variables contribute to the model substantially.
The component-residual plots do not reveal marked deviations from linearity:
> library(car) #if you haven’t loaded it yet > crPlot(m, var = "Length") > crPlot(m, var = "log1p(Freq)")
> library(car) #if you haven’t loaded it yet > plot(m, which = 1) > ncvTest(m) Non-constant Variance Score Test Variance formula: ~ fitted.values Chisquare = 11.85373 Df = 1 p = 0.0005754606 > ncvTest(m, ~ Length) Non-constant Variance Score Test Variance formula: ~ Length Chisquare = 16.11164 Df = 1 p = 5.971584e-05 > ncvTest(m, ~ log1p(Freq)) Non-constant Variance Score Test Variance formula: ~ log1p(Freq) Chisquare = 3.289005 Df = 1 p = 0.06974526
The diagnostic plot and non-constant variance tests suggest that there is some heteroschedasticity, in particular, in the relationship between the response and word length.
> library(car) # if you haven’t done so yet > vif(m) Length log1p(Freq) 1.356621 1.356621
The VIF-scores are too low to suspect multicollinearity.
> shapiro.test(m$residuals) Shapiro-Wilk normality test data: m$residuals W = 0.89934, p-value = 1.317e-06
The residuals are not normally distributed.
> library(car) # if you haven’t done so yet > influencePlot(m, id.method = "identify")
The word diacritical has a very high residual and Cook’s score, followed by dessertspoon.
> m1 <- lm(Mean_RT ~ Length + log1p(Freq), data = ldt[-c(29, 100),]) > summary(m1) [output omitted]
From the summary one can see that the coefficient of Length has become smaller, and both R2 measures have improved. Moreover, the non-constant variance test shows no significant evidence of heteroschedasticity any more, and the residuals are now normally distributed:
> ncvTest(m1) Non-constant Variance Score Test Variance formula: ~ fitted.values Chisquare = 3.002491 Df = 1 p = 0.08313662 > ncvTest(m1, ~ Length) Non-constant Variance Score Test Variance formula: ~ Length Chisquare = 3.720857 Df = 1 p = 0.05373677 > shapiro.test(m1$residuals) Shapiro-Wilk normality test data: m1$residuals W = 0.98278, p-value = 0.2288
> library(rms) > m.ols <- ols(Mean_RT ~ Length + log1p(Freq), data = ldt[-c(29, 100),], x = TRUE, y = TRUE) > validate(m.ols, B = 200) [output omitted]
Since the algorithm involves random sampling from the original data set, the results will vary from one run to another. The slope optimism should be around or smaller than 0.01, so there should be no evidence of overfitting.
> m.int <- lm(Mean_RT ~ Length*log1p(Freq), data = ldt[-c(29, 100),]) > anova(m1, m.int) Analysis of Variance Table Model 1: Mean_RT ~ Length + log1p(Freq) Model 2: Mean_RT ~ Length * log1p(Freq) Res.Df RSS Df Sum of Sq F Pr(>F) 1 95 757650 2 94 753181 1 4469.3 0.5578 0.457
From the large p-value we can infer that the interaction is not statistically significant.