As the complexity parameter is calculated? What is the meaning of it?
From what I read, the cp is a value at which the tree makes divisions in the nodes until the reduction in the relative error is less than a certain value.
There are places I read that say the CP affects only the growth of the tree and others say that interferes with pruning too. For min appears that it interferes only in growth but not sure.
I am using rpart () package to create trees, in the case of the classification tree exists missclassification rate to evaluate the ratings, but in the case of regression is not anything to evaluate the predictions beyond the MSE?
Best Answer
This is answered in this
rpart
resource. From p. 25:That same page gives this formula for how the
cp
parameter affects calculation of a tree's risk:$$R_{cp}(T) ≡ R(T) + cp ∗ |T| ∗ R(T_1)$$
($T_1$ here is a tree with no splits, $|T|$ the splits in the tree. The full formal definition of risk is outside the scope of your question, but for reference the definition is on p. 4.)