Solved – Why report r-squared in Instrumental Variables Estimation

instrumental-variablesr-squaredstatasums-of-squares

I mean the the R-squared calculated such as in $R^2=1-\frac{RSS}{TSS}$ when you use the $RSS$ from the original structural model and not recalculation that you should do in order to do an F test. With said $R^2$, you will not have a proper interpretation for the $R^2$ statistic, as I understand. So why report it? I am familiar with Stata reporting it in commands such as ivreg2 and I think other software packages do it too.

On another note, why is it more popular to run the calculations of $R^2$ with $R^2=1-\frac{RSS}{TSS}$ instead of $R^2=\frac{MSS}{TSS}$?

Best Answer

It's true that $R^2$ in instrumental variables regressions is not useful. Since one of the explanatory variables $x$ is correlated with the error $\epsilon$ we can't decompose the variance of the outcome $y$ into $\beta^2 Var(x) + Var(\epsilon)$, so the obtained $R^2$ neither has a natural interpretation nor can it be used for computation of F-tests for joint rejection. Also $R^2$ in instrumental variables regression can be negative and for this point it makes not difference for whether you use $$R^2 = \frac{MSS}{TSS} \quad \text{or} \quad R^2 = 1- \frac{RSS}{TSS}$$ because when $RSS>TSS$, then we also have that $MSS = TSS - RSS < 0$. In general the two expressions are the same so there should be no reason for why one would be more popular than the other. The issue is discussed in more length on the Stata website resources and support FAQs (link).

[edit] to address the additional question in the comment
When you instrument the endogenous variable $x$ with your instrument $z$ as $$x = \alpha + \pi z + \eta$$ you use the predicted values $\widehat{x}$ in the second stage $$y = a + \beta \widehat{x} + \epsilon$$ and if you do this procedure by hand in Stata like

reg x z
predict x_hat, xb
reg y x_hat

the standard errors will be calculated as $y - \widehat{x}\beta$ but these standard errors will be wrong. They are wrong because $\widehat{x}$ is an estimated quantity and not a random variable. The property of these standard errors though is that $RSS < TSS$ and there would be no negative $R^2$ and $\widehat{x}\beta$ is going to be a better predictor of $y$ than $\overline{y}$.

To calculate the corrected standard errors you use the actual values of the endogenous variable $x$ and not its fitted values when computing $e = y − x\beta$. The issue with this is that in this case you are computing the $RSS$ from a different set of regressors than those that are used to actually fit the model from which we take the $TSS$. For this reason it can happen that $x\beta$ is a worse predictor for $y$ than $\overline{y}$.