Ch. 14 | Exercise 2

Chapter 14 | Exercise 2

Using the same data as in Exercise 1, grow a random forest with 1000 trees and compute the conditional variable importance scores. Create a dot chart to visualize the importance of each variable. Compute the C-index and the accuracy statistic. 

> library(Rling) > data(nerd) > library(party) > nerd.rf <- cforest(Noun ~ Num + Century + Register + Eval, data = nerd, controls = cforest_unbiased(ntree = 1000, mtry = 2)) > nerd.varimp <- varimp(nerd.rf, conditional = TRUE) > round(nerd.varimp, 3) Num Century Register Eval -0.002 0.024 -0.002 0.055 > dotchart(sort(nerd.varimp), main = "Conditional importance of variables")

Only the variables Eval and Century are important.

> nerd.rf.pred <- unlist(treeresponse(nerd.rf))[c(FALSE, TRUE)] > library(Hmisc) > somers2(nerd.rf.pred, as.numeric(nerd$Noun) - 1) C Dxy n Missing 0.6966776 0.3933552 1316.0000000 0.0000000

The concordance index C is 0.697.

> table(predict(nerd.rf), nerd$Noun) geek nerd geek 409 225 nerd 261 421 > (409 + 421)/nrow(nerd)[1] 0.6306991

The accuracy score is 0.63. Note that the results will slightly differ as you run different models.