$R^2=1-\frac{SSE}{SST}$, where $SSE$ is the sum of squared error (residuals or deviations from the regression line) and $SST$ is the sum of squared deviations from the dependent's $Y$ mean.
$MSE=\frac{SSE}{n-m}$, where $n$ is the sample size and $m$ is the number of parameters in the model (including intercept, if any).
$R^2$ is a standardized measure of degree of predictedness, or fit, in the sample. $MSE$ is the estimate of variance of residuals, or non-fit, in the population. The two measures are clearly related, as seen in the most usual formula for adjusted $R^2$ (the estimate of $R^2$ for population):
$R_{adj}^2=1-(1-R^2)\frac{n-1}{n-m}=1-\frac{SSE/(n-m)}{SST/(n-1)}=1-\frac{MSE}{\sigma_y^2}$.
1. What formula does lm
in R use for adjusted r-square?
As already mentioned, typing summary.lm
will give you the code that R uses to calculate adjusted R square. Extracting the most relevant line you get:
ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n - df.int)/rdf)
which corresponds in mathematical notation to:
$$R^2_{adj} = 1 - (1 - R^2) \frac{n-1}{n-p-1}$$
assuming that there is an intercept (i.e., df.int=1
), $n$ is your sample size, and $p$ is your number of predictors. Thus, your error degrees of freedom (i.e., rdf
) equals n-p-1
.
The formula corresponds to what Yin and Fan (2001) label Wherry Formula-1 (there is apparently another less common Wherry formula that uses $n-p$ in the denominator instead $n-p-1$). They suggest it's most common names in order of occurrence are "Wherry formula", "Ezekiel formlua", "Wherry/McNemar formula", and "Cohen/Cohen formula".
2. Why are there so many adjusted r-square formulas?
$R^2_{adj}$ aims to estimate $\rho^2$, the proportion of variance explained in the population by the population regression equation. While this is clearly related to sample size and the number of predictors, what is the best estimator is less clear. Thus, you have simulation studies such as Yin and Fan (2001) that have evaluated different adjusted r-square formulas in terms of how well they estimate $\rho^2$ (see this question for further discussion).
You will see with all the formulas, the difference between $R^2$ and $R^2_{adj}$ gets smaller as the sample size increases. The difference approaches zero as sample size tends to infinity. The difference also get smaller with fewer predictors.
3. How to interpret $R^2_{adj}$?
$R^2_{adj}$ is an estimate of the proportion of variance explained by the true regression equation in the population $\rho^2$. You would typically be interested in $\rho^2$ where you are interested in the theoretical linear prediction of a variable. In contrast, if you are more interested in prediction using the sample regression equation, such is often the case in applied settings, then some form of cross-validated $R^2$ would be more relevant.
References
- Yin, P., & Fan, X. (2001). Estimating $R^2$ shrinkage in multiple regression: A comparison of different analytical methods. The Journal of Experimental Education, 69(2), 203-224. PDF
Best Answer
I wont go into the real maths of it (as I don't understand it myself), but I can explain it in more general terms.
Multiple R squared is simply a measure of Rsquared for models that have multiple predictor variables. Therefore it measures the amount of variation in the response variable that can be explained by the predictor variables. The fundamental point is that when you add predictors to your model, the multiple Rsquared will always increase, as a predictor will always explain some portion of the variance.
Adjusted Rsquared controls against this increase, and adds penalties for the number of predictors in the model. Therefore it shows a balance between the most parsimonious model, and the best fitting model. Generally, if you have a large difference between your multiple and your adjusted Rsquared that indicates you may have overfit your model.
Hope this helps. Hopefully someone may come along and explain this more in depth.