R-Squared – Adjusted $R^2$ as an Alternative to a Test Set?

multiple regressionr-squared

I am trying to understand the following statement from Max Kuhn and Julia Silge about the use of adjusted $R^2$ when fitting a regression model. Is it correct that the adjusted $R^2$ does not need to be applied to a test? Their suggestion seems to be that its primary use is when you are using the same set to train and evaluate your model. Is this correct?

The yardstick package does not contain a function for adjusted $R^2$. This commonly used modification of the coefficient of determination is needed when the same data used to fit the model are used to evaluate the model. This metric is not fully supported in tidymodels because it is always a better approach to compute performance on a separate data set than the one used to fit the model.

https://www.tmwr.org/performance.html

Best Answer

In short, yes. Would not apply an adjusted $R^2$ to a test set.

The test set $R^2$ is a "shrinkage estimate" for the $R^2$ in the same way as the adjusted $R^2$. Thus, applying the adjusted $R^2$ to a test set is likely to be overcorrecting. This is because, as the authors of yardstick note, that adjustment is typically applied to the $R^2$ of the training sample. The adjusted $R^2$'s correction is based on assumptions about the nature of the model and variables in the model (which may not be true in many cases; see this wiki for an extensive discussion of $R^2$ as well as adjusted $R^2$).

By contrast, when you estimate a model on a training set, compute its $R^2$, and then examine the shrinkage in the $R^2$ when applying that model to the test set - that is, estimate the test set $R^2$ - you are doing the same thing, conceptually, as what the adjusted $R^2$ is attempting to do. The adjusted $R^2$ shrinks using a mathematical correction; the test set $R^2$ shrinks using cross-validation.

Both the adjusted $R^2$ and test set $R^2$ are used for similar purposes, to evaluate/adjust for the original model overfitting to the data, but go about adjusting how the model overfit in different ways. The test set $R^2$ makes fewer strong assumptions about the nature of the model - that is what I believe the idea underlying the quote in the original post hits at.

Related Question