Chapter 14 | Exercise 1
Using the data set nerd
, which was introduced in the exercise for Chapter 12, create a conditional inference tree and visualize it. Which variables are responsible for the splits? Does the model predict the actual outcomes well? Compute the C-index and the accuracy statistic to assess the goodness of fit.
> library(Rling)
> data(nerd)
> library(party)
> nerd.ctree <- ctree(Noun ~ Num + Century + Register + Eval, data = nerd)
> plot(nerd.ctree)
The variables that are responsible for the splits are Eval and Century.
> nerd.ctree.pred <- unlist(treeresponse(nerd.ctree))[c(FALSE, TRUE)]
> library(Hmisc)
> somers2(nerd.ctree.pred, as.numeric(nerd$Noun)- 1)
C Dxy n Missing
0.6727935 0.3455871 1316.0000000 0.0000000
> table(predict(nerd.ctree), nerd$Noun)
geek nerd
geek 227 61
nerd 443 585
> (227 + 585)/nrow(nerd)
[1] 0.6170213
The concordance index C = 0.673. The accuracy is 0.617. These results show that there is quite some room for improvement.