Solved – Why does a decision tree have low bias & high variance

biascartcovariancemachine learningvariance

Questions

  1. Does it depend on whether the tree is shallow or deep? Or can we say this irrespective of the depth/levels of the tree?
  2. Why is bias low & variance high? Please explain intuitively and mathematically

Best Answer

A bit late to the party but i feel that this question could use answer with concrete examples.

I will write summary of this excellent article: bias-variance-trade-off, which helped me understand the topic.

The prediction error for any machine learning algorithm can be broken down into three parts:

  • Bias Error
  • Variance Error
  • Irreducible Error

Irreducible error

As the name implies, is an error component that we cannot correct, regardless of algorithm and it's parameter selection. Irreducible error is due to complexities which are simply not captured in the training set. This could be attributes which we don't have in a learning set but they affect the mapping to outcome regardless.

Bias error

Bias error is due to our assumptions about target function. The more assumptions(restrictions) we make about target functions, the more bias we introduce. Models with high bias are less flexible because we have imposed more rules on the target functions.

Variance error

Variance error is variability of a target function's form with respect to different training sets. Models with small variance error will not change much if you replace couple of samples in training set. Models with high variance might be affected even with small changes in training set.

Consider simple linear regression:

Y=b0+b1x

Obviously, this is a fairly restrictive definition of a target function and therefore this model has a high bias.

On the other hand, due to low variance if you change couple of data samples, it's unlikely that this will cause major changes in the overall mapping the target function performs. On the other hand, algorithm such as k-nearest-neighbors have high variance and low bias. It's easy to imagine how different samples might affect K-N-N decision surface.

Generally, parametric algorithms have a high bias and low variance, and vice versa.

One of the challenges of machine learning is finding the right balance of bias error and variance error.

Decision tree

Now that we have these definitions in place, it's also straightforward to see that decision trees are example of model with low bias and high variance. The tree makes almost no assumptions about target function but it is highly susceptible to variance in data.

There are ensemble algorithms, such as bootstrapping aggregation and random forest, which aim to reduce variance at the small cost of bias in decision tree.