1. What formula does lm
in R use for adjusted r-square?
As already mentioned, typing summary.lm
will give you the code that R uses to calculate adjusted R square. Extracting the most relevant line you get:
ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n - df.int)/rdf)
which corresponds in mathematical notation to:
$$R^2_{adj} = 1 - (1 - R^2) \frac{n-1}{n-p-1}$$
assuming that there is an intercept (i.e., df.int=1
), $n$ is your sample size, and $p$ is your number of predictors. Thus, your error degrees of freedom (i.e., rdf
) equals n-p-1
.
The formula corresponds to what Yin and Fan (2001) label Wherry Formula-1 (there is apparently another less common Wherry formula that uses $n-p$ in the denominator instead $n-p-1$). They suggest it's most common names in order of occurrence are "Wherry formula", "Ezekiel formlua", "Wherry/McNemar formula", and "Cohen/Cohen formula".
2. Why are there so many adjusted r-square formulas?
$R^2_{adj}$ aims to estimate $\rho^2$, the proportion of variance explained in the population by the population regression equation. While this is clearly related to sample size and the number of predictors, what is the best estimator is less clear. Thus, you have simulation studies such as Yin and Fan (2001) that have evaluated different adjusted r-square formulas in terms of how well they estimate $\rho^2$ (see this question for further discussion).
You will see with all the formulas, the difference between $R^2$ and $R^2_{adj}$ gets smaller as the sample size increases. The difference approaches zero as sample size tends to infinity. The difference also get smaller with fewer predictors.
3. How to interpret $R^2_{adj}$?
$R^2_{adj}$ is an estimate of the proportion of variance explained by the true regression equation in the population $\rho^2$. You would typically be interested in $\rho^2$ where you are interested in the theoretical linear prediction of a variable. In contrast, if you are more interested in prediction using the sample regression equation, such is often the case in applied settings, then some form of cross-validated $R^2$ would be more relevant.
References
- Yin, P., & Fan, X. (2001). Estimating $R^2$ shrinkage in multiple regression: A comparison of different analytical methods. The Journal of Experimental Education, 69(2), 203-224. PDF
The adjustment is for the number of terms in the regression
If you add interactions, adjusted $R^2$ is not "inflated" because of them ... if the terms add nothing of value, adjusted $R^2$ goes down just as it does when you add new variables that don't relate to the response.
See Wikipedia's article on Coefficient of determination, in the section on Adjusted $R^2$:
a modification due to Theil[7] of $R^2$ that adjusts for the number of explanatory terms in a model relative to the number of data points
... and then the formulas given indicate the same thing:
$$\bar R^2 = {1-(1-R^{2}){n-1 \over n-p-1}} = {R^{2}-(1-R^{2}){p \over n-p-1}}$$
(since interactions count in the count of parameters) and
$$\bar R^2 = {1-{SS_\text{res}/df_e \over SS_\text{tot}/df_t}}$$
(i.e. because it uses a term with the df for error, which will go down as you add interaction terms, it clearly adjusts for the effect of adding terms whether they're interaction terms or not).
So adjusted $R^2$ unambiguously accounts for the effect of adding new terms into your model, whether they're from interactions between existing variables or from additional variables.
Best Answer
To answer your first question: if you have even a single predictor, the $R^2$ and adj. $R^2$ would be different. Check the adj. $R^2$ formula below:
$$\bar R^2 = 1 - (1 - R^2)\frac{n-1}{n-p-1}$$
You can see that having a single predictor ($p$) would change the denominator and the adj. $R^2$.
The difference between the two should be very small for a model with a single predictor (and large sample size). I don't think there is a right or wrong value here, and it is hard to tell whether the difference is meaningful or not without information on the analysis. In any case, which one to use depends on your objectives.