Solved – Use of regression-trees to determine probabilities for a binary variable

cartclassificationprobabilityrandom forestrpart

I have a binary variable (sold/not sold) and I have used the CART algorithm in R (rpart) to build a classification tree to predict if this product is getting sold or not.
Now I would like to add a probability to this. How can I do this?
Can I just take the same Test/Trainingsdata from before and just build a regression tree on it or is this not gonna work?
Can I somehow use the trees I got from the classification?
And how could I evaluate the solution?
Or is there a better approach to deal with this?
Thank you very, very much!

Best Answer

From predict.rpart "If the rpart object is a classification tree, then the default is to return prob predictions, a matrix whose columns are the probability of the first, second, etc. class", so all you need to do is apply your fitted model to test/new data: predict(model, newdata), where type = "prob" is the default. And if you are not getting that, check if it runs in classification mode (does summary(model) output classes or mean/MSE?) and if not convert your (already binary) response variable to factor <-as.factor(x) This is almost identical for RF in randomForest.

Best Answer

Related Solutions

Decision Trees – Are They Almost Always Binary Trees?

Solved – Classification and regression trees (cart)

Related Question