This makes sense to me.
I'll focus on the Gaussian case. Here each tree $T_i$ is fit on the residuals of the current model, and the model update is $M_{i+1} = M_{i} + \alpha T_i$. The idea of a gradient booster is to carefully and slowly reduce the bias of the model by adding these trees one by one.
In this case, a large value of $w_i$ would correspond to a terminal (leaf) node giving a very large and significant update to the prior model. The idea of the regularization term is to minimize these incidents of large single tree updates (only allowing them if the decrease in the model loss function is large enough to offset the regularization penalty). If such an update is regularized away for a single tree, but turns out to be justified, it will be baked in over multiple model updates, in accordance with the philosophy of boosting.
This is in very close analogy to ridge regression.
First please check Matthew Drury's answer here for a similar question.
In general, model complexity can be defined as a function of number of free parameters: the more free parameters a model has, the more complex the model is. In addition, if a model has many parameters but we put many restrictions to them, so they are not that "free", it would also give us a "simpler" model. That is the key idea behind regularization.
There are different definitions of model complexity. AIC and BIC are widely used.
In the link you provided, the notations are not standardized or widely used in statistics / machine learning fields but with some customization of author's preference. As mentioned in the document, $T$ is number of leaves in a boosting tree and $w_i$ is score for each leaves. So intuitively, the more leaves we have, the more free parameters we have, and the large the weights are the more complex the model is (which is similar to ridge regression)
To conclude, as mentioned in the original link, this is one way of defining the complexity (not a "standard way"), and the it penalize number of leaves and the weights of the leaves using L2 norm.
$\gamma$ is a threshold for the gain.
if the gain is smaller than $\gamma$, we would do better not to add that branch.
$\lambda$ is a regularization parameter.
The larger the $\gamma$ and $\lambda$ are the more regularization on the model / simpler the model is.
check this link and search gamma
and lambda
.
gamma [default=0]
minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be.
lambda [default=1]
L2 regularization term on weights, increase this value will make model more conservative.
Best Answer
XGBoost is not sensitive to monotonic transformations of its features for the same reason that decision trees and random forests are not: the model only needs to pick "cut points" on features to split a node. Splits are not sensitive to monotonic transformations: defining a split on one scale has a corresponding split on the transformed scale.
Your confusion stems from misunderstanding $w$. In the section "Model Complexity," the author writes
The score measures the weight of the leaf. See the diagram in the "Tree Ensemble" section; the author labels the number below the leaf as the "score."
The score is also defined more precisely in the paragraph preceding your expression for $\Omega(f)$:
What this expression is saying is that $q$ is a partitioning function of $R^d$, and $w$ is the weight associated with each partition. Partitioning $R^d$ can be done with coordinate-aligned splits, and coordinate-aligned splits are decision trees.
The meaning of $w$ is that it is a "weight" chosen so that the loss of the ensemble with the new tree is lower than the loss of the ensemble without the new tree. This is described in "The Structure Score" section of the documentation. The score for a leaf $j$ is given by
$$ w_j^* = \frac{G_j}{H_j + \lambda} $$
where $G_j$ and $H_j$ are the sums of functions of the partial derivatives of the loss function wrt the prediction for tree $t-1$ for the samples in the $j$th leaf. (See "Additive Training" for details.)