1. What formula does lm
in R use for adjusted r-square?
As already mentioned, typing summary.lm
will give you the code that R uses to calculate adjusted R square. Extracting the most relevant line you get:
ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n - df.int)/rdf)
which corresponds in mathematical notation to:
$$R^2_{adj} = 1 - (1 - R^2) \frac{n-1}{n-p-1}$$
assuming that there is an intercept (i.e., df.int=1
), $n$ is your sample size, and $p$ is your number of predictors. Thus, your error degrees of freedom (i.e., rdf
) equals n-p-1
.
The formula corresponds to what Yin and Fan (2001) label Wherry Formula-1 (there is apparently another less common Wherry formula that uses $n-p$ in the denominator instead $n-p-1$). They suggest it's most common names in order of occurrence are "Wherry formula", "Ezekiel formlua", "Wherry/McNemar formula", and "Cohen/Cohen formula".
2. Why are there so many adjusted r-square formulas?
$R^2_{adj}$ aims to estimate $\rho^2$, the proportion of variance explained in the population by the population regression equation. While this is clearly related to sample size and the number of predictors, what is the best estimator is less clear. Thus, you have simulation studies such as Yin and Fan (2001) that have evaluated different adjusted r-square formulas in terms of how well they estimate $\rho^2$ (see this question for further discussion).
You will see with all the formulas, the difference between $R^2$ and $R^2_{adj}$ gets smaller as the sample size increases. The difference approaches zero as sample size tends to infinity. The difference also get smaller with fewer predictors.
3. How to interpret $R^2_{adj}$?
$R^2_{adj}$ is an estimate of the proportion of variance explained by the true regression equation in the population $\rho^2$. You would typically be interested in $\rho^2$ where you are interested in the theoretical linear prediction of a variable. In contrast, if you are more interested in prediction using the sample regression equation, such is often the case in applied settings, then some form of cross-validated $R^2$ would be more relevant.
References
- Yin, P., & Fan, X. (2001). Estimating $R^2$ shrinkage in multiple regression: A comparison of different analytical methods. The Journal of Experimental Education, 69(2), 203-224. PDF
It's true that $R^2$ in instrumental variables regressions is not useful. Since one of the explanatory variables $x$ is correlated with the error $\epsilon$ we can't decompose the variance of the outcome $y$ into $\beta^2 Var(x) + Var(\epsilon)$, so the obtained $R^2$ neither has a natural interpretation nor can it be used for computation of F-tests for joint rejection. Also $R^2$ in instrumental variables regression can be negative and for this point it makes not difference for whether you use
$$R^2 = \frac{MSS}{TSS} \quad \text{or} \quad R^2 = 1- \frac{RSS}{TSS}$$
because when $RSS>TSS$, then we also have that $MSS = TSS - RSS < 0$. In general the two expressions are the same so there should be no reason for why one would be more popular than the other. The issue is discussed in more length on the Stata website resources and support FAQs (link).
[edit] to address the additional question in the comment
When you instrument the endogenous variable $x$ with your instrument $z$ as
$$x = \alpha + \pi z + \eta$$
you use the predicted values $\widehat{x}$ in the second stage
$$y = a + \beta \widehat{x} + \epsilon$$
and if you do this procedure by hand in Stata like
reg x z
predict x_hat, xb
reg y x_hat
the standard errors will be calculated as $y - \widehat{x}\beta$ but these standard errors will be wrong. They are wrong because $\widehat{x}$ is an estimated quantity and not a random variable. The property of these standard errors though is that $RSS < TSS$ and there would be no negative $R^2$ and $\widehat{x}\beta$ is going to be a better predictor of $y$ than $\overline{y}$.
To calculate the corrected standard errors you use the actual values of the endogenous variable $x$ and not its fitted values when computing $e = y − x\beta$. The issue with this is that in this case you are computing the $RSS$ from a different set of regressors than those that are used to actually fit the model from which we take the $TSS$. For this reason it can happen that $x\beta$ is a worse predictor for $y$ than $\overline{y}$.
Best Answer
You are fitting multiple parameters in your model. (Usually, you fit one parameter for every variable, but your model is non-linear so that isn't the case, even though you have only one $X$ variable.) With every additional parameter, your model has the opportunity to fit the data better, even if that parameter shouldn't be fitted (e.g., if $b$, or $g$ are actually $1$). The adjusted $R^2$ statistic attempts to correct for that added flexibility.
$R^2$ doesn't really mean too much on its own. A low value may be appropriate (that's the amount of information that can legitimately be explained) or it may indicate a problem with lack of fit. A high value may indicate a particularly informative model, or one that is badly overfit. What constitutes a "low" or "high" $R^2$ will vary by subject matter. Etc. Thus, they are most useful in comparison. I gather you will fit the same model to multiple $Y$ variables, but if the $X$ variable and the model's functional form are the same each time, it won't make any difference whether you used $R^2$ or $R^2_{\rm adj}$ as long as you used the same one each time. As far as which should be reported in a paper, $R^2_{\rm adj}$ is probably ideal, but due to its comparative nature, whichever is more common in your field would be appropriate.