Chapter 14 | Exercise 2
Using the same data as in Exercise 1, grow a random forest with 1000 trees and compute the conditional variable importance scores. Create a dot chart to visualize the importance of each variable. Compute the C-index and the accuracy statistic.
> library(Rling)
> data(nerd)
> library(party)
> nerd.rf <- cforest(Noun ~ Num + Century + Register + Eval, data = nerd, controls = cforest_unbiased(ntree = 1000, mtry = 2))
> nerd.varimp <- varimp(nerd.rf, conditional = TRUE)
> round(nerd.varimp, 3)
Num Century Register Eval -0.002 0.024 -0.002 0.055
> dotchart(sort(nerd.varimp), main = "Conditional importance of variables")
Only the variables Eval and Century are important.
> nerd.rf.pred <- unlist(treeresponse(nerd.rf))[c(FALSE, TRUE)]
> library(Hmisc)
> somers2(nerd.rf.pred, as.numeric(nerd$Noun) - 1)
C Dxy n Missing
0.6966776 0.3933552 1316.0000000 0.0000000
The concordance index C is 0.697.
> table(predict(nerd.rf), nerd$Noun)
geek nerd
geek 409 225
nerd 261 421
> (409 + 421)/nrow(nerd)[1] 0.6306991
The accuracy score is 0.63. Note that the results will slightly differ as you run different models.