Machine Learning – Adjusted R^2 in Tree Ensembles for Model Evaluation

boostingcartmachine learningmodel-evaluationr-squared

Consider tree ensemble methods such as XGBoost, Lightgbm and/or Catboost.

Is the adj. $R^2$ a valid metric for tree ensembles?

I'm curious because these methods handle factor variables differently. E.g. XGBoost needs some kind of one-hot encoding, Lightgbm tries to unite one-hot encoded variables and Catboost uses an unique handling called ordered target encoding. Without going into details, the aforementioned handling does not extend the feature space since the factors are encoded inside of the factor variable. At least for XGBoost and Catboost this always leads to a different number of variables which favors Catboost in terms of the adj. $R^2$ metric.

Best Answer

No, adjusted $R^2$ is not a valid metric of tree ensembles.

In order to have an "adjusted" $R^2$ we need some agreed concept of degrees of freedom. Tree ensembles don't have one, so the idea of "adjusting for model complexity" in a universal manner is ill-defined. To that extent, different GBM implementations even use different tree-growing strategies thus making even simple "tree-to-tree" comparison's somewhat moot. We should just use a proper cross-validation schema with $R^2$ (if it is a metric relevant to the application at hand).