Solved – Adjusted R-squared: number of terms or independent variables

interactionoverfittingr-squaredregressionstatistical significance

When applying a multiple linear regression, does the adjusted R-squared value depend on the number of independent variables in the model or the number of terms? Specifically, I'm concerned that adding interaction terms while keeping the number of independent variables the same may artificially inflate my adjusted R-squared value. Would this be the case? If so, is there a better metric of fit that I could use that acknowledges these additional interaction terms?

Best Answer

The adjustment is for the number of terms in the regression

If you add interactions, adjusted $R^2$ is not "inflated" because of them ... if the terms add nothing of value, adjusted $R^2$ goes down just as it does when you add new variables that don't relate to the response.

See Wikipedia's article on Coefficient of determination, in the section on Adjusted $R^2$:

a modification due to Theil[7] of $R^2$ that adjusts for the number of explanatory terms in a model relative to the number of data points

... and then the formulas given indicate the same thing:

$$\bar R^2 = {1-(1-R^{2}){n-1 \over n-p-1}} = {R^{2}-(1-R^{2}){p \over n-p-1}}$$

(since interactions count in the count of parameters) and

$$\bar R^2 = {1-{SS_\text{res}/df_e \over SS_\text{tot}/df_t}}$$

(i.e. because it uses a term with the df for error, which will go down as you add interaction terms, it clearly adjusts for the effect of adding terms whether they're interaction terms or not).

So adjusted $R^2$ unambiguously accounts for the effect of adding new terms into your model, whether they're from interactions between existing variables or from additional variables.