Solved – How Residuals of Instrumental Variables Estimation are calculated and why you can have a negative R-squared

2slsr-squaredregressionresidualsstata

I would like to understand, precisely, why you can have a negative $R^2$ with a 2SLS estimation, such as you have in commands like ivreg2 in Stata. There is reference for such an occurrence for ivregress here: http://tinyurl.com/qbb9o9s. It states the residuals are calculated with the endogenous variables of the structural model, but it also states that the estimation does not have a nested constant-only version of the model — this last detail is probably what makes possible the $R^2$ to be negative, I guess. I know that an estimation without the constant can produce a negative R-squared (I understand this), but I cannot say for sure if not having a nested constant-only version of the model is in any way similar. Could someone help me understand how the residuals are calculated?

Best Answer

First of all, ask yourself whether your instruments are actually strong enough to warrant the usage of TSLS. As you perhaps know from Bound et al. (1995), your estimates can be badly biased and inconsistent with 2SLS, see for example here. Moreover, you should do an F test for the first stage and check whether it's about ten.

Even better, use robust test statistics. Ivreg2 and condivreg have some available but only for one endogenous regressor under conditional homoskedasticity. The R square value is usually useless for inference. Check whether your coefficients are statistically significant, first using a t-test and then Anderson-Rubin confidence intervals as given by condivreg.

These intervals may be infinitely large which will then correspond to your instrument strength.

Related Question