Solved – How to use decision stump as weak learner in Adaboost

boostingcartclassificationmachine learning

I want to implement Adaboost using Decision Stump. Is it correct to make as many decision stump as our data set's features in each iteration of Adaboost?

For example, if I have a data set with 24 features, should I have 24 decision stump classifier in each iteration? Or should I randomly choose some features and make classifier on them instead of all of the features?

Best Answer

The typical way of training a (1-level) Decision Tree is finding such an attribute that gives the purest split. I.e. if we split our dataset into two subsets, we want the labels inside these subsets to be as homogeneous as possible. So it can also be seen as building many trees - a tree for each attribute - and then selecting the tree that produces the best split.

In some cases it also makes sense to select a subset of attributes and then train trees on the subset. For example, this is used in Random Forest for reducing correlation between individual trees.

But when it comes to AdaBoost, typically it is enough to make sure the base classifier can be trained on weighed data points, and random feature selection is less important. Decision trees can handle weights (see e.g. here or here). It may be done by weighting the contribution of each data point to the total subset impurity.

For reference I'll also add my AdaBoost implementation in python using numpy and sklearn's DecisionTreeClassifier with max_depth=1:

# input: dataset X and labels y (in {+1, -1})
hypotheses = []
hypothesis_weights = []

N, _ = X.shape
d = np.ones(N) / N

for t in range(num_iterations):
    h = DecisionTreeClassifier(max_depth=1)

    h.fit(X, y, sample_weight=d)
    pred = h.predict(X)

    eps = d.dot(pred != y)
    alpha = (np.log(1 - eps) - np.log(eps)) / 2

    d = d * np.exp(- alpha * y * pred)
    d = d / d.sum()

    hypotheses.append(h)
    hypothesis_weights.append(alpha)

For predicting the labels:

# X input, y output
y = np.zeros(N)
for (h, alpha) in zip(hypotheses, hypotheses_weight):
    y = y + alpha * h.predict(X)
y = np.sign(y)
Related Question