Solved – Analyzing CART type trees in R–options for grouping results, formatting plots

cartdata visualizationr

I'm new to using CART trees, but have been asked to do so for a project I'm working on. I've had success running the scripts (from both RPART and PARTY packages) but I can't seem to get exactly what I'm looking for. I'm working with spectral data (Red, NIR, NDVI…) for 80 trees in four categories (Mesic-control, Mesic-fertilized, Xeric-control and Xeric-fertilized). There are significant differences in the mean values for spectral bands among the four categories and I'd like to use those differences to develop an algorithm for assigning category to unknown trees.

Here's a dummy tree I made using the RPART package:
RPART tree

fit <- rpart(Category ~ red.top + NIR.top + R.NIR.top, method="anova", data=CCA)
plot(fit, uniform=T,main="Classification Tree for Kyphosis")
text(fit, use.n=TRUE, all=TRUE, cex=.8)

And here's another tree I made with PARTY:
PARTY tree

library(party)
fit <- ctree(Category ~ red.top + NIR.top + R.NIR.top, data=CCA)
plot(fit, main="Conditional Inference Tree for Kyphosis")
gtree <- ctree(Category ~ ., data = CART)
plot(gtree)

Both look fine, except they don't really do what I want. The RPART one looks good, but I can't figure out how to determine the category identity of the trees in each 'leaf' and the PARTY one is what I want, except the tree is way simplified compared to the regression tree in the first example. My ultimate goal is to essentially combine the two and create a larger regression tree that uses more of the 'rules' from the data and gets me to output 'leaves' with categorical information and some predictive power. I'm not really too hung up on whether I use regression or categorization–as long as it has utilitarian value.

So, I guess what I'm really looking for is better scripts for either package that give me a more detailed tree with visual output (bar graphs on the leaves) or a way to determine the identity of the groups created by the RPART tree.

Best Answer

For a multi-class model, use the rpart.plot package to show at each leaf the predicted class and the probabilities for each class. For example:

data(iris)
library(rpart)
a <- rpart(Species~., data=iris, method="class", cp=.001, minsplit=5)
library(rpart.plot)
rpart.plot(a, type=0, extra=4, under=TRUE, branch=.3)

which gives:

plot

This doesn't show bar charts, but the displayed the class probabilities are just as informative, and more compact. I believe this essentially is the information on the plot you would like displayed.

Extra information can be displayed by using the type and extra arguments of the rpart.plot function. See the bottom of Figure 1 in the vignette for rpart.plot package for another example with a multi-class response.

Related Question