Ensemble Learning – Why Not Always Use Ensemble Learning?

baggingboostingensemble learning

It seems to me that ensemble learning WILL always give better predictive performance than with just a single learning hypothesis.

So, why don't we use them all the time?

My guess is because of perhaps, computational limitations? (even then, we use weak predictors, so I don't know).

Best Answer

In general it is not true that it will always perform better. There are several ensemble methods, each with its own advantages/weaknesses. Which one to use and then depends on the problem at hand.

For example, if you have models with high variance (they over-fit your data), then you are likely to benefit from using bagging. If you have biased models, it is better to combine use them with Boosting. There are also different strategies to form ensembles. The topic is just too wide to cover it in one answer.

But my point is: if you use the wrong ensemble method for your setting, you are not going to do better. For example, using Bagging with a biased model is not going to help.

Also, if you need to work in a probabilistic setting, ensemble methods may not work either. It is known that Boosting (in its most popular forms like AdaBoost) delivers poor probability estimates. That is, if you would like to have a model that allows you to reason about your data, not only classification, you might be better off with a graphical model.

Related Solutions

Boosting Weak Learners – Why Are They Called Weak?

So, boosting is a learning algorithm, which can generate high-accuracy predictions using as a subroutine another algorithm, which in turn can efficiently generate hypotheses just slightly better (by an inverse polynomial) than random guessing.

It's main advantage is speed.

When Schapire presented it in 1990 it was a breakthrough in that it showed that a polynomial time learner generating hypotheses with errors just slightly smaller than 1/2 can be transformed into a polynomial time learner generating hypotheses with an arbitrarily small error.

So, the theory to back up your question is in "The strength of weak learnability" (pdf) where he basically showed that the "strong" and "weak" learning are equivalent.

And perhaps the answer the the original question is, "there's no point constructing strong learners when you can construct weak ones more cheaply".

From the relatively recent papers, there's "On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms" (pdf) which I don't understand but which seems related and may be of interest to more educated people :)

Solved – When should I not use an ensemble classifier

The model that is closest to the true data generating process will always be best and will beat most ensemble methods. So if the data come from a linear process lm() will be much superior to random forests, e.g.:

    set.seed(1234)
p=10
N=1000
#covariates
x = matrix(rnorm(N*p),ncol=p)
#coefficients:
b = round(rnorm(p),2)
y = x %*% b + rnorm(N)
train=sample(N, N/2)
data = cbind.data.frame(y,x)
colnames(data) = c("y", paste0("x",1:p))
#linear model
fit1 = lm(y ~ ., data = data[train,])
summary(fit1)
yPred1 =predict(fit1,data[-train,])
round(mean(abs(yPred1-data[-train,"y"])),2)#0.79

library(randomForest)
fit2 = randomForest(y ~ ., data = data[train,],ntree=1000)
yPred2 =predict(fit2,data[-train,])
round(mean(abs(yPred2-data[-train,"y"])),2)#1.33

Best Answer

Related Solutions

Boosting Weak Learners – Why Are They Called Weak?

Solved – When should I not use an ensemble classifier

Related Question