Econometrics – How to Manually Calculate Standard Errors for Instrumental Variables

econometricsinstrumental-variablesstandard errorstata

I am working on statistical inference with instrumental variables (IV) following Wooldridge (2016) Introductory Econometrics, Ch. 15. I am using the Card data set (like the book), with wages as outcome ($y$), education as a endogenous continuous treatment ($x$) and distance to college as a binary IV ($z$).

I want to calculate the standard errors manually, and preferably additionally in matrix form using Mata. So far, I am able to calculate coefficients but I can't seem to obtain the correct standard errors and would be happy for input on this.

I obtain the point estimate for $\beta_{IV}$ with the Wald-estimator:

$\beta_{IV}=\frac{\mathbb{E}[y | z = 1]-\mathbb{E}[y | z = 0]}{\mathbb{E}[x | z = 1]-\mathbb{E}[x | z = 0]}$,

$\beta_{IV}=\frac{6.311401-6.155494}{13.52703-12.69801}=.18806$

Cross-checked with Stata's -ivregress-:

. ivregress 2sls y (x=z), nohe
------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .1880626   .0262826     7.16   0.000     .1365496    .2395756
       _cons |   3.767472   .3487458    10.80   0.000     3.083943    4.451001
------------------------------------------------------------------------------

I now want to proceed by calculating the standard errors. Wooldridge (2016, p. 466) writes that standard errors for $\beta_{IV}$ is obtained by using the square root of the estimated asymptotic variance, where the latter is obtained by

$Var(\beta_{IV})=\frac{\sigma^{2}}{SST_{x} \cdot R^{2}_{x,z}}$

First, $SST_{x}$ is the total sum of squares for $x_{i}$, calculated by

. use http://pped.org/card.dta, clear // Load Card data set

. rename nearc4 z

. rename educ x

. rename lwage y

. * SSTx
. egen x_bar = mean(x)

. gen SSTx = (x-x_bar)^2

. quiet sum SSTx

. di r(sum)
21562.08

Second, $R^{2}_{x,z}$ is obtained from the regression output,

. reg x z, nohe 
------------------------------------------------------------------------------
           x |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           z |    .829019   .1036988     7.99   0.000     .6256912    1.032347
       _cons |   12.69801   .0856416   148.27   0.000     12.53009    12.86594
------------------------------------------------------------------------------

. di .829^2
.687241

Finally, $\sigma^{2}$ is the error variance given by $SSE/(n-k-1)$ where the squared estimate of errors (SSE) is obtained by $SSE = \sum{(y_{i}-\hat{y_{i}})^{2}}$. Wooldridge says to use the IV residuals $\hat{u_{i}}$ in calculating the error variance,

$\sigma^{2}=\frac{1}{(n-2)} \sum{\hat{u_{i}}^2}$

Which I calculate in Stata as,

. quiet reg x z

. predict x_hat
(option xb assumed; fitted values)

. quiet reg y x_hat, nohe

. predict iv_resid
(option xb assumed; fitted values)

. quiet sum iv_resid

. di r(sum)
18848.115

. di (18848.114)^2
3.553e+08

. gen sigma_squared = 3.553e+08

. tabstat sigma_squared, format(%20.2f)

    variable |      mean
-------------+----------
sigma_squa~d |        355300000.00
------------------------

. di (1/(3010-2))*355300000
118118.35

Thus, when finally I substitute the values into the equation for the variance of $\beta_{IV}$, I get

$Var(\beta_{IV})=\frac{118118.35}{21562.08 \cdot .687241}=7.9711$

I then calculate the standard error by following the formula for standard error (e.g. Wooldridge 2016, p. 50):

$\hat{\sigma} = \sqrt{\hat{\sigma}^{2}} \implies \sqrt{7.9711}=2.8233$

$se(\beta_{IV})=\frac{\sigma}{\sqrt{SST_{x}}} \implies \frac{2.8233}{\sqrt{21562.08}}=0.01922 $

I have used quite some time on this and it would really be helpful with some input on what I am doing wrong.

EDIT: Based on the formula provided by Drunk Deriving, I've tried to calculate SE in Mata

. use http://pped.org/card.dta, clear

. keep nearc4 educ lwage id 

. rename nearc4 Z

. rename educ X

. rename lwage y

. mata

: y=st_data(.,"y")

: X=st_data(.,"X")

: Z=st_data(.,"Z")

: X = X, J(rows(X),1,1) // Add constant

: Z = Z, J(rows(Z),1,1) // Add constant

: b_iv = luinv(Z'*X)*Z'*y

: e=y-X*b_iv

: v=luinv(Z'*X)*Z'e*e'*Z*luinv(Z'*X)

: xmean = mean(X)

: tss_x = sum((X :- xmean) :^ 2)

: se=sqrt(v)/tss_x

: t=b_iv:/se

: p=2*ttail(rows(X)-cols(X),abs(t))

: b_iv,se,t,p
                 1             2             3             4             5             6             7
    +---------------------------------------------------------------------------------------------------+
  1 |  .1880626042             .   1.69178e-17             .   1.11162e+16             .             0  |
  2 |  3.767472015   4.17102e-17             .   9.03251e+16             .             0             .  |
    +---------------------------------------------------------------------------------------------------+

: end

Best Answer

HEre it is an option

use http://pped.org/card.dta, clear
 keep nearc4 educ lwage id 
 rename nearc4 z
 rename educ x
 rename lwage y
 bysort z: sum y x

gen byte one=1
mata: 
    y=st_data(.,"y")
    x=st_data(.,"x one")
    z=st_data(.,"z one")
    xh=z*invsym(z'*z)*z'*x
    biv=invsym(xh'*xh)*xh'*y
    biv2=luinv(z'*x)*z'*y
    //residuals
    re=y-x*biv
    vcv=sum(re:^2)/(rows(y))*invsym(xh'*xh)
    vcv
end
ivregress 2sls y (x=z), 
matrix list e(V)

the main difference with your previous code is how errors are defined (re=y-x*biv) and that, ivregress Stata does not adjust for degrees of freedom. otherwise if you use the following:

mata:sum(re:^2)/(rows(y)-2)*invsym(xh'*xh)

you need to compare it to

ivregress 2sls y (x=z), small
Related Question