Can anyone explain the primary differences between conditional inference trees (ctree
from party
package in R) compared to the more traditional decision tree algorithms (such as rpart
in R)?
- What makes CI trees different?
- Strengths and weaknesses?
Update: I have looked at the paper by Horthorn et al that Chi refers to in the comments. I was not able to follow it completely – can anyone explain how variables are selected using permutations (e.g. what is an influence function)?
Best Answer
For what it's worth:
both
rpart
andctree
recursively perform univariate splits of the dependent variable based on values on a set of covariates.rpart
and related algorithms usually employ information measures (such as the Gini coefficient) for selecting the current covariate.ctree
, according to its authors (see chl's comments) avoids the following variable selection bias ofrpart
(and related methods): They tend to select variables that have many possible splits or many missing values. Unlike the others,ctree
uses a significance test procedure in order to select variables instead of selecting the variable that maximizes an information measure (e.g. Gini coefficient).The significance test, or better: the multiple significance tests computed at each start of the algorithm (select covariate - choose split - recurse) are permutation tests, that is, the "the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points." (from the wikipedia article).
Now for the test statistic: it is computed from transformations (including identity, that is, no transform) of the dependent variable and the covariates. You can choose any of a number of transformations for both variables. For the DV (Dependant Variable), the transformation is called the influence function you were asking about.
Examples (taken from the paper):
small example for a permutation test in
R
:Now suppose you have a set of covariates, not only one as above. Then calculate p-values for each of the covariates like in the above scheme, and select the one with the smallest p-value. You want to calculate p-values instead of the correlations directly, because you could have covariates of different kinds (e.g. numeric and categorical).
Once you have selected a covariate, now explore all possible splits (or often a somehow restricted number of all possible splits, e.g. by requiring a minimal number of elements of the DV before splitting) again evaluating a permutation-based test.
ctree
comes with a number of possible transformations for both DV and covariates (see the help forTransformations
in theparty
package).so generally the main difference seems to be that
ctree
uses a covariate selection scheme that is based on statistical theory (i.e. selection by permutation-based significance tests) and thereby avoids a potential bias inrpart
, otherwise they seem similar; e.g. conditional inference trees can be used as base learners for Random Forests.This is about as far as I can get. For more information, you really need to read the papers. Note that I strongly recommend that you really know what you're doing when you want to apply any kind of statistical analysis.