Solved – On the “strength” of weak learners

boostingensemble learningmachine learning

I have several closely-related questions regarding weak learners in ensemble learning (e.g. boosting).

  1. This may sound dumb, but what are the benefits of using weak as opposed to strong learners? (e.g. why not boost with "strong" learning methods?)
  2. Is there some sort of "optimal" strength for the weak learners (e.g. while keeping all the other ensemble parameters fixed)? Is there a "sweet spot" when it comes to their strength?
  3. How can we measure the strength of a weak learner with respect to that of the resulting ensemble method. How do we quantitatively measure the marginal benefits of using an ensemble?
  4. How do we compare several weak learning algorithms to decide which one to use for a given ensemble method?
  5. If a given ensemble method helps weak classifiers more than strong ones, how do we tell a given classifier is already "too strong" to yield any significant gains when boosting with it?

Best Answer

This may be more in bagging spirit, but nevertheless:

  • If you really have a strong learner, there is no need to improve it by any ensemble stuff.
  • I would say... irrelevant. In blending and bagging trivially, in boosting making a too strong classifier may lead to some breaches in convergence (i.e. a lucky prediction may make the next iteration to predict pure noise and thus decrease performance), but this is usually repaired in proceeding iterations.
  • Again, this is not the real problem. The very core of those methods is to

    1. force the partial classifiers to look deeper in the problem.
    2. join their predictions to attenuate the noise and amplify the signal.

    1) needs some attention in boosting (i.e. good boosting scheme, well behaving partial learner -- but this is mostly to be judged by experiments on the whole boost), 2) in bagging and blending (mostly how to ensure lack of correlation between learners and do not overnoise the ensemble). As long as this is OK, the accuracy of partial classifier is a third order problem.

Related Question