We have the model:
$HOURS = \beta_1 + \beta_2 ln(WAGE) + \beta_3EDUC+ \beta_4AGE + \beta_5KIDSL6 + \beta_6KIDSL618 + \beta_7NWIFEINC + e$
Does anyone know why using some variables as instruments make the Wu-hausman test indicate the the regressors are endogenous while using other instruments for instrumental variables the model is not considered to be endogenous. Isn't endogeniety something that could exist even in the OLS model and only dependent on the regressors (not the instruments).
As we can see from the log-file below using (exper exper2) and siblings as instruments the Wu-hausman test indicates that there is endogeniety while using mothereduc, fathereduc and heduc as instruments we do not have endogeniety. Why is this?
I'm under the impression that endogeniety exists och does not exists regardless of which instruments you choose.
reg hours lwage $x2list, vce(robust)
Linear regression Number of obs = 428
F( 6, 421) = 3.93
Prob > F = 0.0008
R-squared = 0.0670
Root MSE = 755.16
------------------------------------------------------------------------------
| Robust
hours | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwage | -17.4078 81.37728 -0.21 0.831 -177.3642 142.5486
educ | -14.44486 18.21292 -0.79 0.428 -50.24445 21.35473
age | -7.729976 5.849662 -1.32 0.187 -19.22816 3.768206
kidsl6 | -342.5048 131.7733 -2.60 0.010 -601.5205 -83.48919
kids618 | -115.0205 29.50866 -3.90 0.000 -173.0232 -57.01786
nwifeinc | -.0042458 .0032235 -1.32 0.189 -.0105821 .0020904
_cons | 2114.697 350.3186 6.04 0.000 1426.106 2803.289
------------------------------------------------------------------------------
. estimate store REG
. ivregress 2sls hours (lwage = exper exper2) $x2list, vce(robust) first
First-stage regressions
-----------------------
Number of obs = 428
F( 7, 420) = 12.62
Prob > F = 0.0000
R-squared = 0.1641
Adj R-squared = 0.1502
Root MSE = 0.6667
------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0998844 .0141577 7.06 0.000 .0720556 .1277131
age | -.0035204 .0061766 -0.57 0.569 -.0156613 .0086205
kidsl6 | -.0558725 .1061345 -0.53 0.599 -.2644936 .1527485
kids618 | -.0176484 .0295136 -0.60 0.550 -.0756611 .0403642
nwifeinc | 5.69e-06 2.76e-06 2.07 0.039 2.75e-07 .0000111
exper | .0407097 .0153088 2.66 0.008 .0106183 .0708012
exper2 | -.0007473 .0004093 -1.83 0.069 -.0015519 .0000572
_cons | -.3579972 .3221853 -1.11 0.267 -.9912938 .2752995
------------------------------------------------------------------------------
Instrumental variables (2SLS) regression Number of obs = 428
Wald chi2(6) = 15.41
Prob > chi2 = 0.0173
R-squared = .
Root MSE = 1291.2
------------------------------------------------------------------------------
| Robust
hours | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwage | 1544.818 598.8004 2.58 0.010 371.1913 2718.446
educ | -177.449 66.84514 -2.65 0.008 -308.463 -46.4349
age | -10.78409 10.57756 -1.02 0.308 -31.51573 9.947557
kidsl6 | -210.8339 203.9118 -1.03 0.301 -610.4936 188.8258
kids618 | -47.55708 56.47944 -0.84 0.400 -158.2547 63.14058
nwifeinc | -.0092491 .0052314 -1.77 0.077 -.0195025 .0010042
_cons | 2432.198 611.223 3.98 0.000 1234.223 3630.173
------------------------------------------------------------------------------
Instrumented: lwage
Instruments: educ age kidsl6 kids618 nwifeinc exper exper2
. estimate store REGIV
. esttab REG REGIV , se(%12.4f) b(%12.5f) star(* 0.10 ** 0.05 *** 0.01) mtitles("OLS" "IV") title("Model test")
Model test
--------------------------------------------
(1) (2)
OLS IV
--------------------------------------------
lwage -17.40780 1544.81848***
(81.3773) (598.8004)
educ -14.44486 -177.44896***
(18.2129) (66.8451)
age -7.72998 -10.78409
(5.8497) (10.5776)
kidsl6 -342.50482*** -210.83387
(131.7733) (203.9118)
kids618 -115.02051*** -47.55708
(29.5087) (56.4794)
nwifeinc -0.00425 -0.00925*
(0.0032) (0.0052)
_cons 2114.69725*** 2432.19773***
(350.3186) (611.2230)
--------------------------------------------
N 428 428
--------------------------------------------
Standard errors in parentheses
* p<0.10, ** p<0.05, *** p<0.01
.
end of do-file
. do "C:\Users\Patrick\AppData\Local\Temp\STD03000000.tmp"
.
. estat endogenous
Tests of endogeneity
Ho: variables are exogenous
Robust score chi2(1) = 22.2071 (p = 0.0000)
Robust regression F(1,420) = 26.355 (p = 0.0000)
. estat overid
Test of overidentifying restrictions:
Score chi2(1) = 1.23424 (p = 0.2666)
.
end of do-file
Above I do IV regress with exper and exper2 as instruments for lwage. We find the variables are exogenous.
. ivregress 2sls hours (lwage = mothereduc) $x2list, vce(robust) first
First-stage regressions
-----------------------
Number of obs = 428
F( 6, 421) = 12.33
Prob > F = 0.0000
R-squared = 0.1380
Adj R-squared = 0.1257
Root MSE = 0.6762
------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1154201 .0155732 7.41 0.000 .0848091 .146031
age | -4.87e-06 .0053543 -0.00 0.999 -.0105294 .0105196
kidsl6 | -.095265 .1084798 -0.88 0.380 -.3084945 .1179645
kids618 | -.0433942 .028462 -1.52 0.128 -.0993396 .0125511
nwifeinc | 3.21e-06 2.64e-06 1.22 0.225 -1.98e-06 8.41e-06
mothereduc | -.0199982 .0115711 -1.73 0.085 -.0427425 .0027461
_cons | -.0692528 .3233721 -0.21 0.831 -.7048779 .5663723
------------------------------------------------------------------------------
Instrumental variables (2SLS) regression Number of obs = 428
Wald chi2(6) = 23.90
Prob > chi2 = 0.0005
R-squared = 0.0610
Root MSE = 751.35
------------------------------------------------------------------------------
| Robust
hours | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwage | -106.4097 590.134 -0.18 0.857 -1263.051 1050.232
educ | -5.158324 64.91348 -0.08 0.937 -132.3864 122.0698
age | -7.55598 5.979866 -1.26 0.206 -19.2763 4.164342
kidsl6 | -350.0063 135.5988 -2.58 0.010 -615.775 -84.23754
kids618 | -118.864 37.42259 -3.18 0.001 -192.2109 -45.51706
nwifeinc | -.0039608 .0038792 -1.02 0.307 -.0115639 .0036424
_cons | 2096.609 382.8154 5.48 0.000 1346.304 2846.913
------------------------------------------------------------------------------
Instrumented: lwage
Instruments: educ age kidsl6 kids618 nwifeinc mothereduc
. estimate store IVREGmother
. estat endogenous
Tests of endogeneity
Ho: variables are exogenous
Robust score chi2(1) = .023462 (p = 0.8783)
Robust regression F(1,420) = .023051 (p = 0.8794)
. esttab REG IVREGmother , se(%12.4f) b(%12.5f) star(* 0.10 ** 0.05 *** 0.01) mtitles("ols" "IVmother") title("Model test")
Model test
--------------------------------------------
(1) (2)
ols IVmother
--------------------------------------------
lwage -17.40780 -106.40969
(81.3773) (590.1340)
educ -14.44486 -5.15832
(18.2129) (64.9135)
age -7.72998 -7.55598
(5.8497) (5.9799)
kidsl6 -342.50482*** -350.00627***
(131.7733) (135.5988)
kids618 -115.02051*** -118.86399***
(29.5087) (37.4226)
nwifeinc -0.00425 -0.00396
(0.0032) (0.0039)
_cons 2114.69725*** 2096.60887***
(350.3186) (382.8154)
--------------------------------------------
N 428 428
--------------------------------------------
Standard errors in parentheses
* p<0.10, ** p<0.05, *** p<0.01
.
end of do-file
. do "C:\Users\Patrick\AppData\Local\Temp\STD03000000.tmp"
. ivregress 2sls hours (lwage = fathereduc) $x2list, vce(robust) first
First-stage regressions
-----------------------
Number of obs = 428
F( 6, 421) = 12.90
Prob > F = 0.0000
R-squared = 0.1364
Adj R-squared = 0.1241
Root MSE = 0.6768
------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1141899 .0151482 7.54 0.000 .0844144 .1439654
age | .0010194 .0052505 0.19 0.846 -.0093011 .0113399
kidsl6 | -.0875674 .1083449 -0.81 0.419 -.3005317 .1253969
kids618 | -.0458046 .0285587 -1.60 0.109 -.1019399 .0103308
nwifeinc | 3.48e-06 2.68e-06 1.30 0.195 -1.79e-06 8.75e-06
fathereduc | -.0163147 .010337 -1.58 0.115 -.0366333 .0040039
_cons | -.1432883 .3155694 -0.45 0.650 -.7635762 .4769996
------------------------------------------------------------------------------
Instrumental variables (2SLS) regression Number of obs = 428
Wald chi2(6) = 19.99
Prob > chi2 = 0.0028
R-squared = .
Root MSE = 856.52
------------------------------------------------------------------------------
| Robust
hours | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwage | 599.808 775.6417 0.77 0.439 -920.4217 2120.038
educ | -78.84572 84.45428 -0.93 0.351 -244.3731 86.68163
age | -8.936616 6.77463 -1.32 0.187 -22.21465 4.341414
kidsl6 | -290.4833 159.9655 -1.82 0.069 -604.01 23.04338
kids618 | -88.36656 46.33123 -1.91 0.056 -179.1741 2.440976
nwifeinc | -.0062226 .0042903 -1.45 0.147 -.0146314 .0021863
_cons | 2240.138 416.825 5.37 0.000 1423.176 3057.1
------------------------------------------------------------------------------
Instrumented: lwage
Instruments: educ age kidsl6 kids618 nwifeinc fathereduc
. estimate store IVREGfather
. estat endogenous
Tests of endogeneity
Ho: variables are exogenous
Robust score chi2(1) = .763741 (p = 0.3822)
Robust regression F(1,420) = .756369 (p = 0.3850)
. esttab REG IVREGfather , se(%12.4f) b(%12.5f) star(* 0.10 ** 0.05 *** 0.01) mtitles("ols" "IVfather") title("Model test")
Model test
--------------------------------------------
(1) (2)
ols IVfather
--------------------------------------------
lwage -17.40780 599.80803
(81.3773) (775.6417)
educ -14.44486 -78.84572
(18.2129) (84.4543)
age -7.72998 -8.93662
(5.8497) (6.7746)
kidsl6 -342.50482*** -290.48330*
(131.7733) (159.9655)
kids618 -115.02051*** -88.36656*
(29.5087) (46.3312)
nwifeinc -0.00425 -0.00622
(0.0032) (0.0043)
_cons 2114.69725*** 2240.13767***
(350.3186) (416.8250)
--------------------------------------------
N 428 428
--------------------------------------------
Standard errors in parentheses
* p<0.10, ** p<0.05, *** p<0.01
.
end of do-file
. do "C:\Users\Patrick\AppData\Local\Temp\STD03000000.tmp"
. ivregress 2sls hours (lwage = heduc) $x2list, vce(robust) first
First-stage regressions
-----------------------
Number of obs = 428
F( 6, 421) = 12.07
Prob > F = 0.0000
R-squared = 0.1354
Adj R-squared = 0.1230
Root MSE = 0.6772
------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1180822 .0159596 7.40 0.000 .0867118 .1494527
age | .0014976 .0051916 0.29 0.773 -.0087071 .0117022
kidsl6 | -.0807521 .1109522 -0.73 0.467 -.2988414 .1373372
kids618 | -.0438408 .028807 -1.52 0.129 -.1004642 .0127827
nwifeinc | 4.35e-06 2.80e-06 1.55 0.121 -1.15e-06 9.85e-06
heduc | -.0195687 .0125636 -1.56 0.120 -.0442638 .0051265
_cons | -.1325044 .3141308 -0.42 0.673 -.7499645 .4849558
------------------------------------------------------------------------------
Instrumental variables (2SLS) regression Number of obs = 428
Wald chi2(6) = 20.02
Prob > chi2 = 0.0027
R-squared = .
Root MSE = 874.46
------------------------------------------------------------------------------
| Robust
hours | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwage | 652.9985 904.9896 0.72 0.471 -1120.748 2426.745
educ | -84.39566 97.88184 -0.86 0.389 -276.2405 107.4492
age | -9.040602 7.200103 -1.26 0.209 -23.15254 5.071341
kidsl6 | -286.0002 173.2299 -1.65 0.099 -625.5245 53.52414
kids618 | -86.06958 51.85856 -1.66 0.097 -187.7105 15.57133
nwifeinc | -.0063929 .0045117 -1.42 0.156 -.0152356 .0024498
_cons | 2250.948 447.9813 5.02 0.000 1372.921 3128.975
------------------------------------------------------------------------------
Instrumented: lwage
Instruments: educ age kidsl6 kids618 nwifeinc heduc
. estimate store IVREGhusband
. estat endogenous
Tests of endogeneity
Ho: variables are exogenous
Robust score chi2(1) = .646754 (p = 0.4213)
Robust regression F(1,420) = .641961 (p = 0.4235)
Now using both mothereduc, fathereduc and heduc we have endogeniety.
. esttab REG IVREGhusband , se(%12.4f) b(%12.5f) star(* 0.10 ** 0.05 *** 0.01) mtitles("ols" "IVhusband") title("Model test")
Model test
--------------------------------------------
(1) (2)
ols IVhusband
--------------------------------------------
lwage -17.40780 652.99854
(81.3773) (904.9896)
educ -14.44486 -84.39566
(18.2129) (97.8818)
age -7.72998 -9.04060
(5.8497) (7.2001)
kidsl6 -342.50482*** -286.00018*
(131.7733) (173.2299)
kids618 -115.02051*** -86.06958*
(29.5087) (51.8586)
nwifeinc -0.00425 -0.00639
(0.0032) (0.0045)
_cons 2114.69725*** 2250.94790***
(350.3186) (447.9813)
--------------------------------------------
N 428 428
--------------------------------------------
Standard errors in parentheses
* p<0.10, ** p<0.05, *** p<0.01
.
end of do-file
. do "C:\Users\Patrick\AppData\Local\Temp\STD03000000.tmp"
. ivregress 2sls hours (lwage = siblings) $x2list, vce(robust) first
First-stage regressions
-----------------------
Number of obs = 428
F( 6, 421) = 11.25
Prob > F = 0.0000
R-squared = 0.1326
Adj R-squared = 0.1202
Root MSE = 0.6783
------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1048841 .0146161 7.18 0.000 .0761544 .1336138
age | .0018242 .0052106 0.35 0.726 -.0084178 .0120661
kidsl6 | -.0842532 .1111352 -0.76 0.449 -.3027022 .1341957
kids618 | -.0439312 .028611 -1.54 0.125 -.1001695 .0123071
nwifeinc | 3.17e-06 2.70e-06 1.17 0.242 -2.14e-06 8.48e-06
siblings | -.0110307 .0136049 -0.81 0.418 -.0377727 .0157113
_cons | -.1668875 .3101863 -0.54 0.591 -.7765943 .4428193
------------------------------------------------------------------------------
Instrumental variables (2SLS) regression Number of obs = 428
Wald chi2(6) = 4.81
Prob > chi2 = 0.5689
R-squared = .
Root MSE = 2100
------------------------------------------------------------------------------
| Robust
hours | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwage | 2896.553 3671.989 0.79 0.430 -4300.413 10093.52
educ | -318.4901 381.4712 -0.83 0.404 -1066.16 429.1796
age | -13.42669 17.76516 -0.76 0.450 -48.24577 21.39239
kidsl6 | -96.90406 471.7734 -0.21 0.837 -1021.563 827.7549
kids618 | 10.81645 187.3789 0.06 0.954 -356.4395 378.0724
nwifeinc | -.0135783 .0138106 -0.98 0.326 -.0406465 .0134899
_cons | 2706.919 1153.148 2.35 0.019 446.7899 4967.048
------------------------------------------------------------------------------
Instrumented: lwage
Instruments: educ age kidsl6 kids618 nwifeinc siblings
. estimate store IVREGsib
. estat endogenous
Tests of endogeneity
Ho: variables are exogenous
Robust score chi2(1) = 4.30171 (p = 0.0381)
Robust regression F(1,420) = 4.2976 (p = 0.0388)
. esttab REG IVREGsib , se(%12.4f) b(%12.5f) star(* 0.10 ** 0.05 *** 0.01) mtitles("ols" "IVsibling") title("Model test")
Model test
--------------------------------------------
(1) (2)
ols IVsibling
--------------------------------------------
lwage -17.40780 2896.55290
(81.3773) (3671.9886)
educ -14.44486 -318.49015
(18.2129) (381.4712)
age -7.72998 -13.42669
(5.8497) (17.7652)
kidsl6 -342.50482*** -96.90406
(131.7733) (471.7734)
kids618 -115.02051*** 10.81645
(29.5087) (187.3789)
nwifeinc -0.00425 -0.01358
(0.0032) (0.0138)
_cons 2114.69725*** 2706.91871**
(350.3186) (1153.1481)
--------------------------------------------
N 428 428
--------------------------------------------
Standard errors in parentheses
* p<0.10, ** p<0.05, *** p<0.01
.
end of do-file
. do "C:\Users\Patrick\AppData\Local\Temp\STD03000000.tmp"
. esttab REG REGIV IVREGmother IVREGfather IVREGhusband IVREGsib , se(%12.4f) b(%12.5f) star(* 0.10 ** 0.05 *** 0.01) mtitles("ols" "IV" "IVmother" "IVfather" "IVhusband" "IVsibl
> ing") title("Model test")
Model test
------------------------------------------------------------------------------------------------------------
(1) (2) (3) (4) (5) (6)
ols IV IVmother IVfather IVhusband IVsibling
------------------------------------------------------------------------------------------------------------
lwage -17.40780 1544.81848*** -106.40969 599.80803 652.99854 2896.55290
(81.3773) (598.8004) (590.1340) (775.6417) (904.9896) (3671.9886)
educ -14.44486 -177.44896*** -5.15832 -78.84572 -84.39566 -318.49015
(18.2129) (66.8451) (64.9135) (84.4543) (97.8818) (381.4712)
age -7.72998 -10.78409 -7.55598 -8.93662 -9.04060 -13.42669
(5.8497) (10.5776) (5.9799) (6.7746) (7.2001) (17.7652)
kidsl6 -342.50482*** -210.83387 -350.00627*** -290.48330* -286.00018* -96.90406
(131.7733) (203.9118) (135.5988) (159.9655) (173.2299) (471.7734)
kids618 -115.02051*** -47.55708 -118.86399*** -88.36656* -86.06958* 10.81645
(29.5087) (56.4794) (37.4226) (46.3312) (51.8586) (187.3789)
nwifeinc -0.00425 -0.00925* -0.00396 -0.00622 -0.00639 -0.01358
(0.0032) (0.0052) (0.0039) (0.0043) (0.0045) (0.0138)
_cons 2114.69725*** 2432.19773*** 2096.60887*** 2240.13767*** 2250.94790*** 2706.91871**
(350.3186) (611.2230) (382.8154) (416.8250) (447.9813) (1153.1481)
------------------------------------------------------------------------------------------------------------
N 428 428 428 428 428 428
------------------------------------------------------------------------------------------------------------
Standard errors in parentheses
* p<0.10, ** p<0.05, *** p<0.01
.
end of do-file
.
And lastly using siblings as instrument gives no endogenous. Why is this? I am doing the same regression for all the models only using different instruments.
Best Answer
To understand your problem you first need to understand how the endogeneity test works. Suppose you have an outcome $y$ and an explanatory variable $x$ which you think is endogenous because it has some correlation with the error term, i.e. $$\begin{matrix}y_i &=& \alpha &+& \beta x_i &+& \epsilon_i & \\ & && & & \hspace{-1cm}\nwarrow & \hspace{-0.8cm} \nearrow \\ & & & & & corr & \end{matrix}$$ then you can use an instrument ($z$) to test whether this is actually true.
When you regress your endogenous variable on the instrument, this splits up the variation of $x$ into an explained part (which we know is exogenous because the instrument $z$ is supposed to be exogenous), and an unexplained part
$$x_i \quad = \underbrace{a \quad + \quad \pi z_i}_{\text{good variation} } \quad + \underbrace{\eta_i}_{\text{bad variation}}$$
Now it is important to understand the required assumptions for a valid instrument:
If either of these conditions fail, we are not successful in separating out the exogenous variation in $x$ using our instrument either because it is weak or it is not exogenous itself.
Your endogeneity test then takes the residuals from this regression, $\widehat{\eta}$, and regresses $$y_i = \alpha + \beta x_i + \delta \widehat{\eta}_i + \epsilon_i$$ and tests $H_0:\delta=0$. If we reject this null then it must be the case that the "bad" part in the variation of $x$ (which we separated out before into $\widehat{\eta}$) significantly affects the outcome and therefore we suspect endogeneity
Now you see why it matters what instrument we use for this test. In your example a couple of instruments fail to meet condition 1. as they are not sufficiently highly correlated with the endogenous variable. For example, mother's education, father's education, and husband's education have first stage F-statistics (i.e. the square of the t-stat in case of one instrument) of 2.99, 2.5, and 2.43, respectively. Typical we worry about instruments with F-statistics of less than 10, so these three are unlikely to be good instruments and therefore any endogeneity test built on them will not be reliable either.