Solved – ignore the negative R-squared value when I am using instrumental variable regression

endogeneitygoodness of fitinstrumental-variablesrr-squared

I am running an instrumental variable regression using 'ivreg' command in R program.

I find that all my validity tests related to endogeneity are satisfied only except the R-squared value which is negative.

May I know whether I can ignore this negative R-squared value without reporting?

If not, what is an alternative manner to resolve this issue? The code is as below:

    > Y_ivreg=ivreg(Y~x1+x2+x3+x4+x5+x6+x7|x2+x8+x9+x10+x5+x6+x7,data=DATA)
    > summary(Y_ivreg,diagnostics=TRUE)

    Call:
    ivreg(formula = Y ~ x1 + x2 + x3 + x4 + x5 + 
        x6 + x7 | x2 + x8 + x9 + x10 + 
        x5 + x6 + x7, data = DATA)

    Residuals:
          Min        1Q    Median        3Q       Max 
    -0.747485 -0.053721 -0.009349  0.044285  1.085256 

    Coefficients:
              Estimate  Std. Error  t value Pr(>|t|)    
 (Intercept)  0.0979178  0.0319244   3.067  0.00218 ** 
    x1        0.0008438  0.0004927   1.712  0.08691 .  
    x2        0.0018515  0.0009135   2.027  0.04277 *  
    x3       -0.0130133  0.0073484  -1.771  0.07667 .  
    x4       -0.0018486  0.0009552  -1.935  0.05303 .  
    x5       -0.0000294  0.0000126  -2.333  0.01971 *  
    x6        0.0018214  0.0008908   2.045  0.04096 *  
    x7       -0.0024457  0.0005488  -4.456 8.61e-06 ***

    Diagnostic tests:
                              df1  df2 statistic p-value    
    Weak instruments (x1)    3 3313   185.440  <2e-16 ***
    Weak instruments (x3)    3 3313  3861.526  <2e-16 ***
    Weak instruments (x4)    3 3313  3126.315  <2e-16 ***
    Wu-Hausman               3 3310     1.943   0.121    
    Sargan                   0   NA        NA      NA    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 0.1142 on 3313 degrees of freedom
    Multiple R-Squared: -0.009029,  Adjusted R-squared: -0.01116 
    Wald test: 4.231 on 7 and 3313 DF,  p-value: 0.0001168 

There is a Stata post link related to this issue and ivregression for your reference:
https://www.stata.com/support/faqs/statistics/two-stage-least-squares/

Best Answer

Yes, the linked STATA post answers your question in a single sentence:

$R^2$ really has no statistical meaning in the context of 2SLS/IV.


How can $R^2$ be negative?

Wikipedia has a great visualization of $R^2$:

wikipedia

On the left, we see the $\color{red}{\text{total sum of squares}}$, obtained by using the mean ($\bar{y}$) as a prediction:

$${\text{total sum of squares}} = \sum_{i = 1}^n (y_i - \bar{y})^2$$

On the right, we see the $\color{blue}{\text{residual sum of squares}}$, obtained by using the model's predictions ($\hat{y}$):

$${\text{residual sum of squares}} = \sum_{i = 1}^n (y_i - \hat{y})^2 = \sum_{i = 1}^n \bigg(y_i - \Big( \hat{\beta}_0 + \sum_{j = 1}^p \hat{\beta}_j \cdot x_j \Big) \bigg)^2$$

Ordinarily, $R^2 = 1 - \frac{\color{blue}{\text{residual sum of squares}}}{\color{red}{\text{total sum of squares}}} \geq 0$, because any model with an intercept ($\beta_0$) should perform at least as well as the image on the left (the intercept could simply be the mean).

However, if you interpret instrumental variable regression as a two-stage linear regression, it is easy to show why it could end up being negative. Namely, suppose the endogenous variables ($\mathbf{X}$) are regressed on the exogenous variables ($\mathbf{Z}$), and the predicted values ($\hat{\mathbf{X}}$) are then used as covariates in the second stage:

$$\text{Stage 1:} \quad \mathbf{X} = \mathbf{Z\delta} + \text{error} \\ \text{Stage 2:} \quad \mathbf{y} = \hat{\mathbf{X}}\mathbf{\beta} + \text{error}$$

Since $\hat{\mathbf{X}} \neq \mathbf{X}$, the error that is minimized in the second stage is not the same as the error used to calculate the residual sum of squares. Consequently, the residual sum of squares need not be less than the total sum of squares anymore. (And more importantly, the $R^2$ has become meaningless.)

Related Question