Solved – Sample Selection Model: MLE vs. Two-Step Estimation and Panel Data

econometricspanel datastata

I am studying the relatively classical model of selection to estimate the union wage premium, with two equations of salary and one two step equation of selection

$\ln(w_{1it}) = X^{'}_{it} \beta_1 + \epsilon_{1it}$

$\ln(w_{0it}) = X^{'}_{it} \beta_0 + \epsilon_{0it}$

$union^*_{it} = \gamma X^{'}_{it} + \epsilon_{2it} $

$union_{it} = \mathbb{I}(union^*_{it}>0)$

I have two different questions:

1) When I implement it as a cross section estimate in Stata (with the heckman command), I have very different results whether I estimate it by Heckman two step method or by MLE.
Is that normal? In which case what is the theoretical reasons?

2) As noted in my equations, I have panel data.
Does the Heckman procedure still apply to it? Or does the within estimation remove endogeneity and solve it all?
I would say it still applies, but I did not find relevant literature and my courses only cover cross section data.

Best Answer

If I understand correctly, you are "tricking" the Heckman selection model to estimate a endogenous switching regression model, also known as the Roy model and Tobit Type 5. This trick is explained in Lee, Lung-Fei (1978) "Unionism and Wage Rates: A Simultaneous Equations Model with Qualitative and Limited Dependent Variables", International Economic Review, Vol. 19(2), pp. 415-433. You're interested if worker characteristics are rewarded differently in the two regimes/sectors ($\beta_0 - \beta_1 \ne 0$) and the correlation parameter $\rho$ tells you about the effect of self-selected union membership on wages in the two sectors. You do, however, may need to adjust the standard errors if the Heckman technique is to be used, or you loose consistency.

Alternatively, since you have an exclusion restriction, you can get the causal effect of union membership on wages using instrumental variables: treatreg/etregress for cross sectional data, and all kinds of panel IV methods like xtivreg.

Some observations. First, there's is a user-written Stata command called movestay designed to estimate the endogenous switching regression model with cross sectional data. It is a full information ML approach, which relies on the multivariate normality of the the error terms assumption, as does the Heckman MLE method. If this is satisfied, both will be consistent, though the movestay will be somewhat more efficient than doing it in two parts.

The Heckman Two Step limited information ML estimator relies only on univariate normality of the marginal distribution, so it is expected to be more robust since that is a lower hurdle to clear. But if you do have joint normality, the Two Step is still consistent, but no longer efficient, especially relative to movestay. However, if you only have univariate normality, then the Two Step remains consistent while the FIML approaches are not.

In short, FIML and LIML approaches will usually differ since they have different information to work with, as I show below with an example. I think this explains question (1).

Now for (2). As far as I know, there is no off-the-shelf panel version of heckman or movestay, though both allow you to cluster the standard errors on the panel id. That's not strictly correct, but may be good enough. There might also be a way to hack it using gllamm, though I have never done this myself since it appears non-trivial. Some notes on that here and Statalist threads here.

I am not really answering the second part of (2) since it is not clear to me how the fixed effects enter the model and how they are related to union membership. With panel methods I suggested above, you give up estimating different parameters in the two regimes/sectors. Depending on the details of your model, you may not even need to instrument if you can difference away the pesky effect. The details depend on your data and models.

Finally, you might consider changing your notation to add instrument(s) Z (something that alters union membership, but is not related to wages directly) and the fixed effects.

Here's some code showing the movestay and Heckman MLE equivalence, along with the problem you have in (1). I am modeling wages with endogenous participation in public/union and private sectors. My instruments are marital status and number of job holders in the household. They are likely to be not very good.

Here's the output:

. #delimit;
delimiter now ;
. use "http://www.adeptanalytics.org/download/ado/movestay/movestay_example.dta", clear;
(Sample dataset to illustrate the use of movestay procedure)

. capture ssc install movestay;

. /* Switcing Regression */
> movestay lmo_wage age age2 edu13 edu4 edu5 reg2 reg3 reg4
> , select(private =age age2 edu13 edu4 edu5 reg2 reg3 reg4 m_s1 job_hold);



Fitting initial values .....
initial:       log likelihood = -2615.3274
rescale:       log likelihood = -2615.3274
rescale eq:    log likelihood = -2504.2563
Iteration 0:   log likelihood = -2504.2563  
Iteration 1:   log likelihood = -2478.8632  
Iteration 2:   log likelihood = -2472.9261  
Iteration 3:   log likelihood = -2471.4714  
Iteration 4:   log likelihood =  -2470.979  
Iteration 5:   log likelihood =   -2470.94  
Iteration 6:   log likelihood = -2470.9304  
Iteration 7:   log likelihood = -2470.9304  

Endogenous switching regression model             Number of obs   =       2094
                                                  Wald chi2(8)    =     742.40
Log likelihood = -2470.9304                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
lmo_wage0    |
         age |  -.0370404   .0111445    -3.32   0.001    -.0588832   -.0151976
        age2 |   .0003735   .0001285     2.91   0.004     .0001216    .0006255
       edu13 |  -.5066122   .0885002    -5.72   0.000    -.6800694    -.333155
        edu4 |   -.410602   .0507909    -8.08   0.000    -.5101502   -.3110537
        edu5 |  -.2973613   .0391875    -7.59   0.000    -.3741673   -.2205552
        reg2 |  -.3780673   .0420359    -8.99   0.000    -.4604562   -.2956784
        reg3 |   .7053256   .0532104    13.26   0.000     .6010351    .8096162
        reg4 |  -.2355433   .0474621    -4.96   0.000    -.3285672   -.1425193
       _cons |   9.322335   .2377244    39.21   0.000     8.856404    9.788266
-------------+----------------------------------------------------------------
lmo_wage1    |
         age |   .0423454   .0291869     1.45   0.147      -.01486    .0995507
        age2 |  -.0005007   .0003227    -1.55   0.121    -.0011332    .0001319
       edu13 |    .343707   .2793213     1.23   0.219    -.2037528    .8911667
        edu4 |  -.1578067   .1608107    -0.98   0.326    -.4729898    .1573764
        edu5 |  -.1640915   .1300285    -1.26   0.207    -.4189427    .0907598
        reg2 |  -.2864992     .10977    -2.61   0.009    -.5016445    -.071354
        reg3 |   .7076898   .1427077     4.96   0.000     .4279879    .9873917
        reg4 |  -.1383784   .1414154    -0.98   0.328    -.4155475    .1387908
       _cons |   7.415708   .4807955    15.42   0.000     6.473366     8.35805
-------------+----------------------------------------------------------------
select       |
         age |  -.1455147   .0258919    -5.62   0.000     -.196262   -.0947674
        age2 |   .0013623   .0003045     4.47   0.000     .0007655    .0019592
       edu13 |   .0761837   .2457816     0.31   0.757    -.4055394    .5579067
        edu4 |   .0690438   .1415167     0.49   0.626    -.2083237    .3464114
        edu5 |   .2351347   .1063559     2.21   0.027      .026681    .4435883
        reg2 |  -.4401673   .0958095    -4.59   0.000    -.6279506   -.2523841
        reg3 |  -.5960666    .118727    -5.02   0.000    -.8287672   -.3633661
        reg4 |  -.6010511   .1127811    -5.33   0.000    -.8220979   -.3800043
        m_s1 |   .1569937   .0921423     1.70   0.088    -.0236019    .3375893
    job_hold |   .0551942    .036172     1.53   0.127    -.0157017      .12609
       _cons |   2.505468   .5789885     4.33   0.000     1.370672    3.640265
-------------+----------------------------------------------------------------
       /lns0 |  -.4220208   .0186565   -22.62   0.000    -.4585869   -.3854546
       /lns1 |  -.5903419    .056246   -10.50   0.000     -.700582   -.4801018
         /r0 |   1.353758   .0813975    16.63   0.000     1.194222    1.513295
         /r1 |   .1457211   .3195399     0.46   0.648    -.4805656    .7720077
-------------+----------------------------------------------------------------
      sigma0 |   .6557204   .0122335                      .6321763    .6801414
      sigma1 |   .5541378    .031168                      .4962964    .6187204
        rho0 |   .8749375   .0190865                      .8318838     .907522
        rho1 |   .1446983   .3128495                     -.4466964    .6480954
------------------------------------------------------------------------------
LR test of indep. eqns. :            chi2(2) =    86.94   Prob > chi2 = 0.0000
------------------------------------------------------------------------------

The $\rho$s are the union effects. $\rho_0$ is positive and significant, so folks who choose to work in the public sector earn lower wages in that sector than a random individual from this sample. Those working in the private sector fare no better or worse than a random individual. The signs are a tad counterintuitive, but the authors of the Stata code parameterized $\rho$ as negative (see the conditional expectations part of the movestay paper). The likelihood-ratio test for joint independence of the three equations is reported in the last line of the output. The "frontslashed" parameters are ancillary. Some folks prefer to multiply $\rho$ and $\sigma$ to get $\lambda$, with standard errors estimated using the delta method.

Now you can get the same estimates using heckman (though the sign on $\rho$ and the two instruments flips since the command is parameterized a bit differently). You see the same negative effect:

. /* regime 0 by hand */
> gen wage0=lmo_wage;

. replace wage0=. if private==1;
(261 real changes made, 261 to missing)

. heckman wage0 age age2 edu13 edu4 edu5 reg2 reg3 reg4
> , select(age age2 edu13 edu4 edu5 reg2 reg3 reg4 m_s1 job_hold) nolog;

Heckman selection model                         Number of obs      =      2094
(regression model with sample selection)        Censored obs       =       261
                                                Uncensored obs     =      1833

                                                Wald chi2(8)       =    742.40
Log likelihood = -2256.833                      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
       wage0 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
wage0        |
         age |  -.0370977   .0111445    -3.33   0.001    -.0589405   -.0152549
        age2 |   .0003741   .0001285     2.91   0.004     .0001221     .000626
       edu13 |  -.5064365   .0885104    -5.72   0.000    -.6799137   -.3329593
        edu4 |  -.4106208   .0507949    -8.08   0.000    -.5101769   -.3110647
        edu5 |  -.2973453   .0391905    -7.59   0.000    -.3741573   -.2205333
        reg2 |  -.3781878    .042037    -9.00   0.000    -.4605788   -.2957968
        reg3 |   .7051728   .0532121    13.25   0.000     .6008791    .8094666
        reg4 |  -.2356527   .0474641    -4.96   0.000    -.3286807   -.1426247
       _cons |   9.323934   .2377026    39.23   0.000     8.858045    9.789822
-------------+----------------------------------------------------------------
select       |
         age |   .1466592    .025787     5.69   0.000     .0961176    .1972007
        age2 |  -.0013735   .0003037    -4.52   0.000    -.0019687   -.0007783
       edu13 |   -.076687    .245785    -0.31   0.755    -.5584168    .4050427
        edu4 |  -.0686979   .1415071    -0.49   0.627    -.3460467    .2086509
        edu5 |  -.2348396   .1063477    -2.21   0.027    -.4432772    -.026402
        reg2 |   .4412484   .0957427     4.61   0.000     .2535961    .6289006
        reg3 |   .5975879   .1186466     5.04   0.000     .3650448    .8301309
        reg4 |   .6023803   .1127047     5.34   0.000     .3814832    .8232775
        m_s1 |  -.1501144   .0912879    -1.64   0.100    -.3290354    .0288066
    job_hold |  -.0528969   .0360066    -1.47   0.142    -.1234687    .0176748
       _cons |  -2.540299   .5744352    -4.42   0.000    -3.666172   -1.414427
-------------+----------------------------------------------------------------
     /athrho |  -1.354341   .0813714   -16.64   0.000    -1.513826   -1.194856
    /lnsigma |  -.4219375   .0186557   -22.62   0.000     -.458502   -.3853731
-------------+----------------------------------------------------------------
         rho |  -.8750741   .0190609                     -.9076157   -.8320789
       sigma |    .655775   .0122339                        .63223    .6801968
      lambda |  -.5738517   .0197394                     -.6125403   -.5351632
------------------------------------------------------------------------------
LR test of indep. eqns. (rho = 0):   chi2(1) =    86.73   Prob > chi2 = 0.0000

Using the two step just kills the results:

. heckman wage0 age age2 edu13 edu4 edu5 reg2 reg3 reg4
> , select(age age2 edu13 edu4 edu5 reg2 reg3 reg4 m_s1 job_hold) twostep nolog;

Heckman selection model -- two-step estimates   Number of obs      =      2094
(regression model with sample selection)        Censored obs       =       261
                                                Uncensored obs     =      1833

                                                Wald chi2(8)       =    785.30
                                                Prob > chi2        =    0.0000

------------------------------------------------------------------------------
       wage0 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
wage0        |
         age |  -.0167191   .0266711    -0.63   0.531    -.0689936    .0355553
        age2 |   .0001794   .0002634     0.68   0.496     -.000337    .0006957
       edu13 |  -.5359287   .0884772    -6.06   0.000    -.7093407   -.3625166
        edu4 |  -.4296174   .0519778    -8.27   0.000    -.5314921   -.3277427
        edu5 |  -.3262479   .0496997    -6.56   0.000    -.4236576   -.2288383
        reg2 |  -.3635313   .0522697    -6.95   0.000     -.465978   -.2610847
        reg3 |   .7009662   .0576929    12.15   0.000     .5878902    .8140422
        reg4 |  -.2060982   .0685499    -3.01   0.003    -.3404535    -.071743
       _cons |   8.780537   .7149883    12.28   0.000     7.379185    10.18189
-------------+----------------------------------------------------------------
select       |
         age |   .1470191   .0276835     5.31   0.000     .0927603    .2012778
        age2 |  -.0013161   .0003254    -4.05   0.000    -.0019538   -.0006784
       edu13 |  -.3523344   .2632156    -1.34   0.181    -.8682275    .1635588
        edu4 |  -.2549107   .1524679    -1.67   0.095    -.5537422    .0439209
        edu5 |  -.4061875   .1181928    -3.44   0.001    -.6378411   -.1745339
        reg2 |   .3442086   .1039805     3.31   0.001     .1404105    .5480066
        reg3 |   .2811355   .1316528     2.14   0.033     .0231009    .5391702
        reg4 |    .543206   .1233285     4.40   0.000     .3014866    .7849253
        m_s1 |  -.1963717   .1067202    -1.84   0.066    -.4055396    .0127961
    job_hold |  -.0330325   .0426152    -0.78   0.438    -.1165567    .0504917
       _cons |  -2.355134   .6282926    -3.75   0.000    -3.586565   -1.123703
-------------+----------------------------------------------------------------
mills        |
      lambda |  -.3120253   .3390203    -0.92   0.357    -.9764929    .3524424
-------------+----------------------------------------------------------------
         rho |   -0.50931
       sigma |  .61264749
------------------------------------------------------------------------------

Now we replicate the second equation, with similar results:

. /* regime 1 by hand */
> gen wage1=lmo_wage;

. replace wage1=. if private==0;
(1833 real changes made, 1833 to missing)

. heckman wage1 age age2 edu13 edu4 edu5 reg2 reg3 reg4
> , select(age age2 edu13 edu4 edu5 reg2 reg3 reg4 m_s1 job_hold) nolog;

Heckman selection model                         Number of obs      =      2094
(regression model with sample selection)        Censored obs       =      1833
                                                Uncensored obs     =       261

                                                Wald chi2(8)       =    108.61
Log likelihood = -876.2178                      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
       wage1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
wage1        |
         age |    .042193   .0292471     1.44   0.149    -.0151304    .0995163
        age2 |  -.0005043    .000317    -1.59   0.112    -.0011255    .0001169
       edu13 |   .3623925   .2845064     1.27   0.203    -.1952298    .9200149
        edu4 |  -.1449537   .1640929    -0.88   0.377    -.4665699    .1766624
        edu5 |  -.1525098   .1380139    -1.11   0.269    -.4230121    .1179925
        reg2 |  -.2824625   .1046311    -2.70   0.007    -.4875357   -.0773894
        reg3 |   .7258022   .1227223     5.91   0.000      .485271    .9663334
        reg4 |  -.1366248   .1385823    -0.99   0.324    -.4082412    .1349916
       _cons |   7.405325   .4670327    15.86   0.000     6.489958    8.320692
-------------+----------------------------------------------------------------
select       |
         age |  -.1454229   .0278551    -5.22   0.000    -.2000179   -.0908278
        age2 |   .0013006   .0003267     3.98   0.000     .0006602    .0019411
       edu13 |    .350613   .2632322     1.33   0.183    -.1653127    .8665387
        edu4 |    .255408   .1524632     1.68   0.094    -.0434144    .5542305
        edu5 |   .4065948    .118201     3.44   0.001      .174925    .6382645
        reg2 |  -.3430896   .1040295    -3.30   0.001    -.5469838   -.1391955
        reg3 |  -.2792558   .1317203    -2.12   0.034    -.5374229   -.0210887
        reg4 |  -.5416784   .1233898    -4.39   0.000     -.783518   -.2998389
        m_s1 |     .20616     .10797     1.91   0.056    -.0054572    .4177772
    job_hold |   .0366664   .0430434     0.85   0.394    -.0476972      .12103
       _cons |   2.305788   .6356115     3.63   0.000     1.060013    3.551564
-------------+----------------------------------------------------------------
     /athrho |    .148785   .3219452     0.46   0.644     -.482216     .779786
    /lnsigma |  -.5899825   .0569334   -10.36   0.000    -.7015699   -.4783951
-------------+----------------------------------------------------------------
         rho |   .1476968   .3149222                     -.4480166    .6525839
       sigma |    .554337   .0315603                      .4958063    .6197772
      lambda |   .0818738    .177591                     -.2661982    .4299457
------------------------------------------------------------------------------
LR test of indep. eqns. (rho = 0):   chi2(1) =     0.21   Prob > chi2 = 0.6457

. heckman wage1 age age2 edu13 edu4 edu5 reg2 reg3 reg4
> , select(age age2 edu13 edu4 edu5 reg2 reg3 reg4 m_s1 job_hold) twostep nolog;

Heckman selection model -- two-step estimates   Number of obs      =      2094
(regression model with sample selection)        Censored obs       =      1833
                                                Uncensored obs     =       261

                                                Wald chi2(8)       =     77.71
                                                Prob > chi2        =    0.0000

------------------------------------------------------------------------------
       wage1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
wage1        |
         age |  -.0364796   .0793241    -0.46   0.646     -.191952    .1189928
        age2 |   .0001722   .0007224     0.24   0.812    -.0012437    .0015881
       edu13 |   .5644776   .3613607     1.56   0.118    -.1437764    1.272732
        edu4 |  -.0199007    .215083    -0.09   0.926    -.4414556    .4016542
        edu5 |   .0559729   .2429543     0.23   0.818    -.4202087    .5321545
        reg2 |  -.4796038   .2147622    -2.23   0.026    -.9005299   -.0586776
        reg3 |   .5580965   .2076475     2.69   0.007     .1511148    .9650782
        reg4 |  -.4331516   .3103303    -1.40   0.163    -1.041388    .1750847
       _cons |   8.310527   .9964926     8.34   0.000     6.357438    10.26362
-------------+----------------------------------------------------------------
select       |
         age |  -.1470191   .0276835    -5.31   0.000    -.2012778   -.0927603
        age2 |   .0013161   .0003254     4.05   0.000     .0006784    .0019538
       edu13 |   .3523344   .2632156     1.34   0.181    -.1635588    .8682275
        edu4 |   .2549107   .1524679     1.67   0.095    -.0439209    .5537422
        edu5 |   .4061875   .1181928     3.44   0.001     .1745339    .6378411
        reg2 |  -.3442086   .1039805    -3.31   0.001    -.5480066   -.1404105
        reg3 |  -.2811355   .1316528    -2.14   0.033    -.5391702   -.0231009
        reg4 |   -.543206   .1233285    -4.40   0.000    -.7849253   -.3014866
        m_s1 |   .1963717   .1067202     1.84   0.066    -.0127961    .4055396
    job_hold |   .0330325   .0426152     0.78   0.438    -.0504917    .1165567
       _cons |   2.355134   .6282926     3.75   0.000     1.123703    3.586565
-------------+----------------------------------------------------------------
mills        |
      lambda |   .7462884   .6279665     1.19   0.235    -.4845034     1.97708
-------------+----------------------------------------------------------------
         rho |    0.87902
       sigma |   .8489974
------------------------------------------------------------------------------
Related Question