Solved – How to explain the fact that “Bagging reduces the variance while retaining the bias” mathematically

baggingbias-variance tradeoffmachine learningrandom forestresampling

I am able to understand the intution behind saying that "Bagging reduces the variance while retaining the bias".

What is the mathematically principle behind this intution? I checked with few experts and they are not able to explain the math well.

Best Answer

Quite surprising that the experts couldn't help you out, the chapter on random forests in "The Elements of Statistical Learning" explains it very well.

Basically, given n i.d.d. random variables each with variance sigma², the variance of the mean of this variables will be sigma²/n.

Since the random forest is build on bootstrap samples of the data, the outputs of the individual trees can be viewed as identically distributed random variables.

Thus, by averaging the outputs of B trees, the variance of the final prediction is given by p*sigma² + (1 - p)sigma² / B, where p is the pairwise correlation between trees. For large B the right term vanishes and the variance is reduced to p*sigma².

This works not only for decision trees but every model that's baggable. The reason why it works particularly well for decision trees is that they inherently have a low bias (no assumptions are made, such as e.g linear relation between features and response) but a very high variance.

Since only the variance can be reduced, decision trees are build to node purity in context of random forest and tree bagging. (Building to node purity maximizes the variance of the individual trees, i.e. they fit the data perfectly, while minimizing the bias.)

Related Question