Solved – Predictive Decision Tree in R

cartr

I am currently working on a dataset in R-studio and as the title might suggest I am having difficulty creating the tree I'm looking for. My dataset consist of 122151 observations with 33 Variables. The dataset is already prepared for properly for a treemodel (no empty values, binairy values, maxs/means/mins)

for eases sake, lets call the dataset df1, the dependent variable x1, and the predicting variables y1, y2, y3,….y32

With the use of the tree package I setup the following code:

     tree <- tree(x1 ~ y2+y3+y4.......+y32, data=df1, model=FALSE)

this however results in a tree with only one node as seen below, where it's suppose to give a tree with roughly 17 nodes.

http://i57.tinypic.com/2lvelad.jpg

What I expect to be the problem is the configuration of the dependent variable, namely 341 yes (1) and 121000+ no (0). This seems to mess up the predictive part and is kinda neglecting the tree.

Is there any way to input a setting that gives a 50% chance for the binary valuation to occure in the dependent variable so the tree actually grows, rather than receiving a 1 node branch?

Best Answer

I think there are two possible issues here:

-first one is to assure yourself that x1 is numeric in order to build up a regression tree.

-assuming you're building up a regression tree a second aspect is to play with the cp parameter in control.rpart (take a look at documentation). This parameter controls the tree prunning and it's likely that you need a lower cp just to see more nodes and branches, but beware with overfitting.. .try for example with 0.001.

Related Question