Solved – Adjusted $R^2$ versus $R^2$ in multiple regression

multiple regressionr-squared

I have 5 predictors in a multiple regression model with samples sizes that range from 157 to 330 for each predictor. Given the variation in sample size, is it better to use the adjusted R-squared value rather than the R-squared value.

Best Answer

$R^2$ measures goodness of fit. But it will not detect overfit because it will increase with any new predictor (unless it has already reached 1). Since adjusted $R^2$ can decrease it can show that some models overfit the data. I think other criteria like AIC and BIC which also do this may work better. So if the purpose is to measure goodness of fit $R^2$ is appropriate and expressions the percentage of variance explained by the model. If the purpose is to identify whether or not the model overfits the data, adjusted $R^2$ is more appropriate. The sample size enters only because when the number of parameters is large and the sample size is small the degree of overfitting will be more severe than under the same circumstances with a much larger sample size. For the larger sample size parameters that should be 0 will be estimated close to 0 and so will not hurt prediction as much as in a small sample where the coefficient could be inappropriately large.