From a quick reading of the paper you mention, the author makes an unnecessary methodological mistake.
$R^2$-adjusted is NOT a measure of fit ("fit" which the author, not unjustifiably, maps conceptually to "explanatory relevance"). Theil has proposed this metric in order to evaluate alternative sets of regressors while penalizing the inflation in the number of them used, on the same data set, as you point out. The way the metric is constructed it cannot be meaningfully interpreted as showing the explanatory power of the regressors.
But I believe the author had no need to use $R^2$-adjusted, because the regressors are very few in the various specifications he implements. He could use simple $R^2$ and most probably he would have arrived at the same conclusions, which would then be methodologically valid because $R^2$ is indeed a measure of fit, and no issues are created to compare how $R^2$ performs, for the same regressors, over different data sets, or over time.
You could communicate with the author on the matter, ask why he used $R^2$-adjusted instead of simple $R^2$ -it is always good when papers generate a discussion.
In my field (social science using cross sectional surveys), an adjusted R squared of .87 would be much too large. That would be a sure sign that you have done somehting meaningless like predict something with a second measure of itself. So whether or not you need to improve your model depends on the context, which you did not give us.
If you are looking for alternative transformations of your explanatory/right-hand-side/predictor-variables you could consider fractional polynomials:
Royston P, Altman DG. (1994): Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling (with discussion). Applied Statistics, 43:429-467.
Royston P, Ambler G, Sauerbrei W. (1999): The use of fractional polynomials to model continuous risk variables in epidemiology. International Journal of Epidemiology, 28:964-974.
Royston P, Sauerbrei W. (2004): A new approach to modelling interactions between treatment and continuous covariates in clinical trials by using fractional polynomials. Statistics in Medicine, 23:2509-2525.
Royston P, Sauerbrei W. (2007): Improving the robustness of fractional polynomial models by preliminary covariate transformation: a pragmatic approach. Computational Statistics and Data Analysis, 51:4240-4253.
Royston P, Sauerbrei W (2008): Multivariable Model-Building - A pragmatic approach to regression analysis based on fractional polynomials for continuous variables. Wiley.
Sauerbrei W. (1999): The use of resampling methods to simplify regression models in medical statistics. Applied Statistics, 48, 313-329.
Sauerbrei W, Meier-Hirmer C, Benner A, Royston P. (2006): Multivariable regression model building by using fractional polynomials: Description of SAS, STATA and R programs. Computational Statistics & Data Analysis, 50:3464-3485.
Sauerbrei W, Royston P. (1999): Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistical Society A, 162:71-94.
Sauerbrei W, Royston P, Binder H (2007): Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Statistics in Medicine, 26:5512-28.
Best Answer
In short, yes. Would not apply an adjusted $R^2$ to a test set.
The test set $R^2$ is a "shrinkage estimate" for the $R^2$ in the same way as the adjusted $R^2$. Thus, applying the adjusted $R^2$ to a test set is likely to be overcorrecting. This is because, as the authors of
yardstick
note, that adjustment is typically applied to the $R^2$ of the training sample. The adjusted $R^2$'s correction is based on assumptions about the nature of the model and variables in the model (which may not be true in many cases; see this wiki for an extensive discussion of $R^2$ as well as adjusted $R^2$).By contrast, when you estimate a model on a training set, compute its $R^2$, and then examine the shrinkage in the $R^2$ when applying that model to the test set - that is, estimate the test set $R^2$ - you are doing the same thing, conceptually, as what the adjusted $R^2$ is attempting to do. The adjusted $R^2$ shrinks using a mathematical correction; the test set $R^2$ shrinks using cross-validation.
Both the adjusted $R^2$ and test set $R^2$ are used for similar purposes, to evaluate/adjust for the original model overfitting to the data, but go about adjusting how the model overfit in different ways. The test set $R^2$ makes fewer strong assumptions about the nature of the model - that is what I believe the idea underlying the quote in the original post hits at.