Solved – what is instance weight in boosting

boostingmachine learningweighted-data

enter image description here

Hi, I am reading something about Boosting, and I had hard time understanding one of the steps in boosting – assign greater weights to those instances.

What does the sentence – assign greater weights to those instances mean ? My understanding is.. for example

Initially, we have training data ($x_1$,$y_1$) ($x_2$,$y_2$) ($x3$,$y_3$) ($x_4$,$y_4$) ($x_5$,$y_5$)
after we first apply the weak learner, we find that (x2,y2) (x3,y3) are misclassified, and we try to adjust the training data, "assigning the weights" so that new training data become… like ($x_1$,$y_1$) ($x_2$,$y_2$),($x_2$,$y_2$) ($x_3$,$y_3$) ($x_3$,$y_3$) ($x_4$,$y_4$) ($x_5$,$y_5$), where we have more misclassified instances ? thus, next learner have more chances to learn the misclassified ones ?

Best Answer

"Instance" is just a somewhat confusing way of saying "case" or "person" or "observation," etc.

Imagine we have N data points we are trying to predict; each of those data points would be an "instance." If our data look like:

  y x
1 1 4
2 0 2
3 0 3
4 1 3
5 1 3

Then we have 5 "instances" and each row (observation, case, etc.) represents an instance. Imagine we predict y from x using a weak learner. We find that instance #3 (y = 0, x = 3) is classified incorrectly. In the next iteration, we would weight that instance higher than the others.

I wouldn't necessarily say that the learner "has more chances to learn the misclassified ones," as every instance/case/row/observation is included in each iteration. It is just that subsequent learners focus more on misclassified instances.