Machine Learning – Definition of Model Complexity in XGBoost

boostingmachine learningpredictive-models

I am having a hard time trying to understand the mathematical principle behind XGBoost. What really bothers me is the definition of the model complexity:

$$
\Omega(f) = \gamma T + \frac12 \lambda \sum_{j=1}^T w_j^2
$$

Can someone tell me what $γ$ and $λ$ stand for?

Best Answer

First please check Matthew Drury's answer here for a similar question.


In general, model complexity can be defined as a function of number of free parameters: the more free parameters a model has, the more complex the model is. In addition, if a model has many parameters but we put many restrictions to them, so they are not that "free", it would also give us a "simpler" model. That is the key idea behind regularization.

There are different definitions of model complexity. AIC and BIC are widely used. In the link you provided, the notations are not standardized or widely used in statistics / machine learning fields but with some customization of author's preference. As mentioned in the document, $T$ is number of leaves in a boosting tree and $w_i$ is score for each leaves. So intuitively, the more leaves we have, the more free parameters we have, and the large the weights are the more complex the model is (which is similar to ridge regression)

To conclude, as mentioned in the original link, this is one way of defining the complexity (not a "standard way"), and the it penalize number of leaves and the weights of the leaves using L2 norm.

$\gamma$ is a threshold for the gain.

if the gain is smaller than $\gamma$, we would do better not to add that branch.

$\lambda$ is a regularization parameter.

The larger the $\gamma$ and $\lambda$ are the more regularization on the model / simpler the model is.

check this link and search gamma and lambda.

gamma [default=0] minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be.

lambda [default=1] L2 regularization term on weights, increase this value will make model more conservative.