Solved – Understanding regularization in xgboost

boostingcartmachine learningregularization

A general loss function is:

\begin{split}\text{obj} = \sum_{i=1}^n l(y_i, \hat{y}_i^{(t)}) + \sum_{i=1}^t\Omega(f_i) \\
\end{split}

which is prediction cost + regularization cost

A decision tree is defined as:

$f_t(x) = w_{q(x)}, w \in R^T, q:R^d\rightarrow \{1,2,\cdots,T\} $

Here w is the vector of scores on leaves, q is a function assigning each data point to the corresponding leaf, and T is the number of leaves. In XGBoost, we define the complexity as

$\Omega(f) = \gamma T + \frac{1}{2}\lambda \sum_{j=1}^T w_j^2$

so it seems to me that $w$'s are the final prediction scores for each leaf made by the decision tree. Under this understanding i get that xgboost is penalizing most confident predictions, even if correct, as part of the regularization term in the cost function.

I am not sure it even qualifies as a question as i am looking for someone to tell me if i am reading it right since i have never seen the confidence of a model being penalized under regularization

Also, are not the two parts of the cost function contradictory in some sense? With one part trying to be more confident and the other part (regularization part) in trying to be less confident

Best Answer

The contradiction you noticed is precisely the idea of the regularization: you want to exchange confidence over the training set for confidence over the test set, since being too confident over the training set does not imply the model will generalize well. Indeed, you could be just fitting training set noise. Then, when you penalize weights, you usually end up with a simpler model (imagine that some weights may become zero) that may perform worse on the training set, but performs better over unseen data.