I have operational fault data and maintenance data. The operational fault data was used to determine if the maintenance improved the fault indicator (true/false). The maintenance data was used to identify what maintenance actions were performed. RPART was used to generate a model, with the maintenance actions as independent variables and operational fault reduction as the categorical output data (true/false). 0.5 was subtracted from the operational fault data so the values were -0.5, 0.5 instead of 0, 1.
I don't understand how to interpret the meaning of the plot of the rtree model. How to determine, or indicate, which of the bottom nodes correspond to true or false? Also, what do the colors indicate.
R commands
subdata <- data.frame(x="maintenance actions", y="Fault improved"-0.5)
rtreeFit <- rpart(y ~ .,data=subdata)
fancyRpartPlot(rtreeFit,main=paste('RPART:'),sub=cName)
Is it possible to draw a histogram for each leaf showing the distribution of classifications?
Here's the updated code
y_subdata = factor(y_training[rowIndx])
x_subdata = x_training[rowIndx, ]
subdata<-data.frame(x=x_subdata,y=y_subdata)
fit <- rpart(y ~ .,method='class',data=subdata,
control=rpart.control(minsplit=3,cp=0.0001))
The numbers are hard to read, but what do the numbers mean?
Best Answer
One thing that concerns me is the way that at least 2 of the variables show up at multiple nodes. I have run a lot of Recursive partitioning using RPART and have come to recognize multiple nodes with the same variable as a sign that the tree may be unreliable (e.g., nodes 1, 3, and 11 are both "x.sum_manhours"). I am not sure why you subtracted 0.5 from your operational fault outcome variable. It seems like this was an attempt to center the data but your outcome is a categorical or factor variable so centering means nothing. By subtracting 0.5 your program may have treated your outcome variable as continuous which would mean that your RPART procedure created a regression tree (continuous outcome) instead of a classification tree (categorical outcome). Finally, there are bootstrapping techniques for checking the stability of your tree that you might consider.