Solved – meant by ‘weak learner’

adaboostclassificationpac-learningsvmterminology

Can anyone tell me what is meant by the phrase 'weak learner'? Is it supposed to be a weak hypothesis? I am confused about the relationship between a weak learner and a weak classifier. Are both the same or is there some difference?

In the adaboost algorithm, T=10. What is meant by that? Why do we select T=10?

Best Answer

A 'weak' learner (classifer, predictor, etc) is just one which performs relatively poorly--its accuracy is above chance, but just barely. There is often, but not always, the added implication that it is computationally simple. Weak learner also suggests that many instances of the algorithm are being pooled (via boosting, bagging, etc) together into to create a "strong" ensemble classifier.

It's mentioned in the original AdaBoost paper by Freund & Schapire:

Perhaps the most surprising of these applications is the derivation of a new application for "boosting", i.e., converting a "weak" PAC learning algorithm that performs just slightly better than random guessing into one with arbitrarily high accuracy. --(Freund & Schapire, 1995)

but I think the phrase is actually older than that--I've seen people cite a term paper(?!) by Michael Kearns from the 1980s.

The classic example of a Weak Learner is a Decision Stump, a one-level decision tree (1R or OneR is another commonly-used weak learner; it's fairly similar). It would be somewhat strange to call a SVM a 'weak learner', even in situations where it performs poorly, but it would be perfectly reasonable to call a single decision stump a weak learner even when it performs surprisingly well by itself.


Adaboost is an iterative algorithm and $T$ typically denotes the number of iterations or "rounds". The algorithm starts by training/testing a weak learner on the data, weighting each example equally. The examples which are misclassified get their weights increased for the next round(s), while those that are correctly classified get their weights decreased.

I'm not sure there's anything magical about $T=10$. In the 1995 paper, $T$ is given as a free parameter (i.e., you set it yourself).

Related Question