XGBClassifier – Utilizing Sample Weights in Machine Learning

boostingclassificationmachine learningscikit learn

I am using Scikit-Learn XGBClassifier API with sample weights. If I multiply sample weights by 2, I get totally different results with exact same parameters and random_state, I am expecting that If we multiply/divide sample weight with a constant, results should not change. Do you have any suggestion?

Best Answer

What you describe, while somewhat unusual it is not unexpected if we do not optimise our XGBoost routine adequately. Your intuition though is correct: "results should not change".

When we change the scale of the sample weights, the sample weights change the deviance residuals associated with each data point; i.e. the use of different sample weights' scale, results in our GBM to train on a different sample per se. When performing gradient boosting iteration, the residuals that serve as leaf weights are multiplied by that sample weights. Therefore the fit themselves are different especially during the first few iterations of XGBoost. Usually the difference in the fit due to different sample weights' scale is not substantial and will ultimately smooth out but it can noticeable (especially during the first iterations).

# Using Python 3.6.9 // xgboost 0.90
import pandas as pd
import numpy as np
from xgboost import XGBRegressor
import xgboost as xgb

w=np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) 
w_2=w*2
X=pd.DataFrame([13.36, 5.35, 0.26, 84.16, 24.67, 22.26, 18.02, 14.20, 61.66, 57.26])
y=pd.DataFrame([37.54, 14.54, -0.72, 261.19, 76.90, 67.15, 53.89, 43.48, 182.60, 179.44])
X_test=pd.DataFrame([0.5]) 

xgb_model=XGBRegressor(n_estimators=100, learning_rate=1, 
                       objective='reg:squarederror', subsample=1, reg_lambda=0.1)

xgb_model.fit(X, y, sample_weight=w)
print(xgb_model.predict(X_test, ntree_limit=5)) 
# [-0.65936375]
print(xgb_model.predict(X_test))
# [-0.71998453]

xgb_model.fit(X, y, sample_weight=w_2)
print(xgb_model.predict(X_test, ntree_limit=5))
#[-0.76515234]
print(xgb_model.predict(X_test))
# [-0.7199712]

As we can see, using either the initial "unit weights" w or their scaled version w_2 returns effectively the same estimate (~0.7199...) when we optimise "enough" (e.g. after doing 100 iterations). Nevertheless, when we are starting the first estimates can be substantially different (-0.6593... against -0.7651...). (Notice the observed behaviour is a bit version dependant. I played with XGBoost ver. 1.0.1 and the difference tapers off very quickly at about ntree_limit=4.)

If we observe substantial difference between the estimates of two boosters where the only difference is the scale of the sample weights, this is primarily indicative of two things:

  1. we have not optimised the boosters adequately so they have not yet reach a steady state. We need to optimise further (e.g. have more iterations).
  2. we have over-fitted our samples so the boosters interpreter the differences in the scaling of sample weights as material difference. We need to regularise more strongly (e.g. have higher regularisation parameters reg_alpha and reg_lambda).
Related Question