Solved – XGBoost implementation for unbalanced data using scale_pos_weight parameter

boostingmathematical-statisticsunbalanced-classes

I have a confusion regarding how cost sensitive custom metric can be used for training of unbalanced dataset (two class 0 and 1) in XGBoost.

Metric: Cost = 10*#of false positives + 500*# of false negatives

Can anyone help me understand how exactly the parameter 'scale_pos_weight' is used while training in XGBoost?

Following is my interpretation. Please correct me if I'm wrong.

objective function: binary:logistic

case 1: when scale_pos_weight = 0
In this case both the classes 0 and 1 are treated equally and while updating the parameters of model during training the values for updating model will be same.

case 2: when scale_pos_weight = 60
In this is case the weight for class 1 is 60 time more than for class 0, so while updating the parameters the values for updating model will me more for class 1 than for class 0.

Since eval_metrics do not contribute to training, So even though I use a class sensitive cost, it will not help me unless I use the parameter 'scale_pos_weight'.

Is my interpretation correct?

Best Answer

The eval_metric and eval_set parameters only control the early stopping behaviour, i.e. the number of trees grown.

You may find this helpful on how XGBoost handles weights.

Related Question