If I understand correctly, you are "tricking" the Heckman selection model to estimate a endogenous switching regression model, also known as the Roy model and Tobit Type 5. This trick is explained in Lee, Lung-Fei (1978) "Unionism and Wage Rates: A Simultaneous Equations Model with Qualitative and Limited Dependent Variables", International Economic Review, Vol. 19(2), pp. 415-433. You're interested if worker characteristics are rewarded differently in the two regimes/sectors ($\beta_0 - \beta_1 \ne 0$) and the correlation parameter $\rho$ tells you about the effect of self-selected union membership on wages in the two sectors. You do, however, may need to adjust the standard errors if the Heckman technique is to be used, or you loose consistency.
Alternatively, since you have an exclusion restriction, you can get the causal effect of union membership on wages using instrumental variables: treatreg/etregress
for cross sectional data, and all kinds of panel IV methods like xtivreg
.
Some observations. First, there's is a user-written Stata command called movestay
designed to estimate the endogenous switching regression model with cross sectional data. It is a full information ML approach, which relies on the multivariate normality of the the error terms assumption, as does the Heckman MLE method. If this is satisfied, both will be consistent, though the movestay
will be somewhat more efficient than doing it in two parts.
The Heckman Two Step limited information ML estimator relies only on univariate normality of the marginal distribution, so it is expected to be more robust since that is a lower hurdle to clear. But if you do have joint normality, the Two Step is still consistent, but no longer efficient, especially relative to movestay
. However, if you only have univariate normality, then the Two Step remains consistent while the FIML approaches are not.
In short, FIML and LIML approaches will usually differ since they have different information to work with, as I show below with an example. I think this explains question (1).
Now for (2). As far as I know, there is no off-the-shelf panel version of heckman
or movestay
, though both allow you to cluster the standard errors on the panel id. That's not strictly correct, but may be good enough. There might also be a way to hack it using gllamm
, though I have never done this myself since it appears non-trivial. Some notes on that here and Statalist threads here.
I am not really answering the second part of (2) since it is not clear to me how the fixed effects enter the model and how they are related to union membership. With panel methods I suggested above, you give up estimating different parameters in the two regimes/sectors. Depending on the details of your model, you may not even need to instrument if you can difference away the pesky effect. The details depend on your data and models.
Finally, you might consider changing your notation to add instrument(s) Z (something that alters union membership, but is not related to wages directly) and the fixed effects.
Here's some code showing the movestay
and Heckman MLE equivalence, along with the problem you have in (1). I am modeling wages with endogenous participation in public/union and private sectors. My instruments are marital status and number of job holders in the household. They are likely to be not very good.
Here's the output:
. #delimit;
delimiter now ;
. use "http://www.adeptanalytics.org/download/ado/movestay/movestay_example.dta", clear;
(Sample dataset to illustrate the use of movestay procedure)
. capture ssc install movestay;
. /* Switcing Regression */
> movestay lmo_wage age age2 edu13 edu4 edu5 reg2 reg3 reg4
> , select(private =age age2 edu13 edu4 edu5 reg2 reg3 reg4 m_s1 job_hold);
Fitting initial values .....
initial: log likelihood = -2615.3274
rescale: log likelihood = -2615.3274
rescale eq: log likelihood = -2504.2563
Iteration 0: log likelihood = -2504.2563
Iteration 1: log likelihood = -2478.8632
Iteration 2: log likelihood = -2472.9261
Iteration 3: log likelihood = -2471.4714
Iteration 4: log likelihood = -2470.979
Iteration 5: log likelihood = -2470.94
Iteration 6: log likelihood = -2470.9304
Iteration 7: log likelihood = -2470.9304
Endogenous switching regression model Number of obs = 2094
Wald chi2(8) = 742.40
Log likelihood = -2470.9304 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lmo_wage0 |
age | -.0370404 .0111445 -3.32 0.001 -.0588832 -.0151976
age2 | .0003735 .0001285 2.91 0.004 .0001216 .0006255
edu13 | -.5066122 .0885002 -5.72 0.000 -.6800694 -.333155
edu4 | -.410602 .0507909 -8.08 0.000 -.5101502 -.3110537
edu5 | -.2973613 .0391875 -7.59 0.000 -.3741673 -.2205552
reg2 | -.3780673 .0420359 -8.99 0.000 -.4604562 -.2956784
reg3 | .7053256 .0532104 13.26 0.000 .6010351 .8096162
reg4 | -.2355433 .0474621 -4.96 0.000 -.3285672 -.1425193
_cons | 9.322335 .2377244 39.21 0.000 8.856404 9.788266
-------------+----------------------------------------------------------------
lmo_wage1 |
age | .0423454 .0291869 1.45 0.147 -.01486 .0995507
age2 | -.0005007 .0003227 -1.55 0.121 -.0011332 .0001319
edu13 | .343707 .2793213 1.23 0.219 -.2037528 .8911667
edu4 | -.1578067 .1608107 -0.98 0.326 -.4729898 .1573764
edu5 | -.1640915 .1300285 -1.26 0.207 -.4189427 .0907598
reg2 | -.2864992 .10977 -2.61 0.009 -.5016445 -.071354
reg3 | .7076898 .1427077 4.96 0.000 .4279879 .9873917
reg4 | -.1383784 .1414154 -0.98 0.328 -.4155475 .1387908
_cons | 7.415708 .4807955 15.42 0.000 6.473366 8.35805
-------------+----------------------------------------------------------------
select |
age | -.1455147 .0258919 -5.62 0.000 -.196262 -.0947674
age2 | .0013623 .0003045 4.47 0.000 .0007655 .0019592
edu13 | .0761837 .2457816 0.31 0.757 -.4055394 .5579067
edu4 | .0690438 .1415167 0.49 0.626 -.2083237 .3464114
edu5 | .2351347 .1063559 2.21 0.027 .026681 .4435883
reg2 | -.4401673 .0958095 -4.59 0.000 -.6279506 -.2523841
reg3 | -.5960666 .118727 -5.02 0.000 -.8287672 -.3633661
reg4 | -.6010511 .1127811 -5.33 0.000 -.8220979 -.3800043
m_s1 | .1569937 .0921423 1.70 0.088 -.0236019 .3375893
job_hold | .0551942 .036172 1.53 0.127 -.0157017 .12609
_cons | 2.505468 .5789885 4.33 0.000 1.370672 3.640265
-------------+----------------------------------------------------------------
/lns0 | -.4220208 .0186565 -22.62 0.000 -.4585869 -.3854546
/lns1 | -.5903419 .056246 -10.50 0.000 -.700582 -.4801018
/r0 | 1.353758 .0813975 16.63 0.000 1.194222 1.513295
/r1 | .1457211 .3195399 0.46 0.648 -.4805656 .7720077
-------------+----------------------------------------------------------------
sigma0 | .6557204 .0122335 .6321763 .6801414
sigma1 | .5541378 .031168 .4962964 .6187204
rho0 | .8749375 .0190865 .8318838 .907522
rho1 | .1446983 .3128495 -.4466964 .6480954
------------------------------------------------------------------------------
LR test of indep. eqns. : chi2(2) = 86.94 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
The $\rho$s are the union effects. $\rho_0$ is positive and significant, so folks who choose to work in the public sector earn lower wages in that sector than a random individual from this sample. Those working in the private sector fare no better or worse than a random individual. The signs are a tad counterintuitive, but the authors of the Stata code parameterized $\rho$ as negative (see the conditional expectations part of the movestay
paper). The likelihood-ratio test for joint independence of the three equations is reported in the last line of the output. The "frontslashed" parameters are ancillary. Some folks prefer to multiply $\rho$ and $\sigma$ to get $\lambda$, with standard errors estimated using the delta method.
Now you can get the same estimates using heckman
(though the sign on $\rho$ and the two instruments flips since the command is parameterized a bit differently). You see the same negative effect:
. /* regime 0 by hand */
> gen wage0=lmo_wage;
. replace wage0=. if private==1;
(261 real changes made, 261 to missing)
. heckman wage0 age age2 edu13 edu4 edu5 reg2 reg3 reg4
> , select(age age2 edu13 edu4 edu5 reg2 reg3 reg4 m_s1 job_hold) nolog;
Heckman selection model Number of obs = 2094
(regression model with sample selection) Censored obs = 261
Uncensored obs = 1833
Wald chi2(8) = 742.40
Log likelihood = -2256.833 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
wage0 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
wage0 |
age | -.0370977 .0111445 -3.33 0.001 -.0589405 -.0152549
age2 | .0003741 .0001285 2.91 0.004 .0001221 .000626
edu13 | -.5064365 .0885104 -5.72 0.000 -.6799137 -.3329593
edu4 | -.4106208 .0507949 -8.08 0.000 -.5101769 -.3110647
edu5 | -.2973453 .0391905 -7.59 0.000 -.3741573 -.2205333
reg2 | -.3781878 .042037 -9.00 0.000 -.4605788 -.2957968
reg3 | .7051728 .0532121 13.25 0.000 .6008791 .8094666
reg4 | -.2356527 .0474641 -4.96 0.000 -.3286807 -.1426247
_cons | 9.323934 .2377026 39.23 0.000 8.858045 9.789822
-------------+----------------------------------------------------------------
select |
age | .1466592 .025787 5.69 0.000 .0961176 .1972007
age2 | -.0013735 .0003037 -4.52 0.000 -.0019687 -.0007783
edu13 | -.076687 .245785 -0.31 0.755 -.5584168 .4050427
edu4 | -.0686979 .1415071 -0.49 0.627 -.3460467 .2086509
edu5 | -.2348396 .1063477 -2.21 0.027 -.4432772 -.026402
reg2 | .4412484 .0957427 4.61 0.000 .2535961 .6289006
reg3 | .5975879 .1186466 5.04 0.000 .3650448 .8301309
reg4 | .6023803 .1127047 5.34 0.000 .3814832 .8232775
m_s1 | -.1501144 .0912879 -1.64 0.100 -.3290354 .0288066
job_hold | -.0528969 .0360066 -1.47 0.142 -.1234687 .0176748
_cons | -2.540299 .5744352 -4.42 0.000 -3.666172 -1.414427
-------------+----------------------------------------------------------------
/athrho | -1.354341 .0813714 -16.64 0.000 -1.513826 -1.194856
/lnsigma | -.4219375 .0186557 -22.62 0.000 -.458502 -.3853731
-------------+----------------------------------------------------------------
rho | -.8750741 .0190609 -.9076157 -.8320789
sigma | .655775 .0122339 .63223 .6801968
lambda | -.5738517 .0197394 -.6125403 -.5351632
------------------------------------------------------------------------------
LR test of indep. eqns. (rho = 0): chi2(1) = 86.73 Prob > chi2 = 0.0000
Using the two step just kills the results:
. heckman wage0 age age2 edu13 edu4 edu5 reg2 reg3 reg4
> , select(age age2 edu13 edu4 edu5 reg2 reg3 reg4 m_s1 job_hold) twostep nolog;
Heckman selection model -- two-step estimates Number of obs = 2094
(regression model with sample selection) Censored obs = 261
Uncensored obs = 1833
Wald chi2(8) = 785.30
Prob > chi2 = 0.0000
------------------------------------------------------------------------------
wage0 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
wage0 |
age | -.0167191 .0266711 -0.63 0.531 -.0689936 .0355553
age2 | .0001794 .0002634 0.68 0.496 -.000337 .0006957
edu13 | -.5359287 .0884772 -6.06 0.000 -.7093407 -.3625166
edu4 | -.4296174 .0519778 -8.27 0.000 -.5314921 -.3277427
edu5 | -.3262479 .0496997 -6.56 0.000 -.4236576 -.2288383
reg2 | -.3635313 .0522697 -6.95 0.000 -.465978 -.2610847
reg3 | .7009662 .0576929 12.15 0.000 .5878902 .8140422
reg4 | -.2060982 .0685499 -3.01 0.003 -.3404535 -.071743
_cons | 8.780537 .7149883 12.28 0.000 7.379185 10.18189
-------------+----------------------------------------------------------------
select |
age | .1470191 .0276835 5.31 0.000 .0927603 .2012778
age2 | -.0013161 .0003254 -4.05 0.000 -.0019538 -.0006784
edu13 | -.3523344 .2632156 -1.34 0.181 -.8682275 .1635588
edu4 | -.2549107 .1524679 -1.67 0.095 -.5537422 .0439209
edu5 | -.4061875 .1181928 -3.44 0.001 -.6378411 -.1745339
reg2 | .3442086 .1039805 3.31 0.001 .1404105 .5480066
reg3 | .2811355 .1316528 2.14 0.033 .0231009 .5391702
reg4 | .543206 .1233285 4.40 0.000 .3014866 .7849253
m_s1 | -.1963717 .1067202 -1.84 0.066 -.4055396 .0127961
job_hold | -.0330325 .0426152 -0.78 0.438 -.1165567 .0504917
_cons | -2.355134 .6282926 -3.75 0.000 -3.586565 -1.123703
-------------+----------------------------------------------------------------
mills |
lambda | -.3120253 .3390203 -0.92 0.357 -.9764929 .3524424
-------------+----------------------------------------------------------------
rho | -0.50931
sigma | .61264749
------------------------------------------------------------------------------
Now we replicate the second equation, with similar results:
. /* regime 1 by hand */
> gen wage1=lmo_wage;
. replace wage1=. if private==0;
(1833 real changes made, 1833 to missing)
. heckman wage1 age age2 edu13 edu4 edu5 reg2 reg3 reg4
> , select(age age2 edu13 edu4 edu5 reg2 reg3 reg4 m_s1 job_hold) nolog;
Heckman selection model Number of obs = 2094
(regression model with sample selection) Censored obs = 1833
Uncensored obs = 261
Wald chi2(8) = 108.61
Log likelihood = -876.2178 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
wage1 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
wage1 |
age | .042193 .0292471 1.44 0.149 -.0151304 .0995163
age2 | -.0005043 .000317 -1.59 0.112 -.0011255 .0001169
edu13 | .3623925 .2845064 1.27 0.203 -.1952298 .9200149
edu4 | -.1449537 .1640929 -0.88 0.377 -.4665699 .1766624
edu5 | -.1525098 .1380139 -1.11 0.269 -.4230121 .1179925
reg2 | -.2824625 .1046311 -2.70 0.007 -.4875357 -.0773894
reg3 | .7258022 .1227223 5.91 0.000 .485271 .9663334
reg4 | -.1366248 .1385823 -0.99 0.324 -.4082412 .1349916
_cons | 7.405325 .4670327 15.86 0.000 6.489958 8.320692
-------------+----------------------------------------------------------------
select |
age | -.1454229 .0278551 -5.22 0.000 -.2000179 -.0908278
age2 | .0013006 .0003267 3.98 0.000 .0006602 .0019411
edu13 | .350613 .2632322 1.33 0.183 -.1653127 .8665387
edu4 | .255408 .1524632 1.68 0.094 -.0434144 .5542305
edu5 | .4065948 .118201 3.44 0.001 .174925 .6382645
reg2 | -.3430896 .1040295 -3.30 0.001 -.5469838 -.1391955
reg3 | -.2792558 .1317203 -2.12 0.034 -.5374229 -.0210887
reg4 | -.5416784 .1233898 -4.39 0.000 -.783518 -.2998389
m_s1 | .20616 .10797 1.91 0.056 -.0054572 .4177772
job_hold | .0366664 .0430434 0.85 0.394 -.0476972 .12103
_cons | 2.305788 .6356115 3.63 0.000 1.060013 3.551564
-------------+----------------------------------------------------------------
/athrho | .148785 .3219452 0.46 0.644 -.482216 .779786
/lnsigma | -.5899825 .0569334 -10.36 0.000 -.7015699 -.4783951
-------------+----------------------------------------------------------------
rho | .1476968 .3149222 -.4480166 .6525839
sigma | .554337 .0315603 .4958063 .6197772
lambda | .0818738 .177591 -.2661982 .4299457
------------------------------------------------------------------------------
LR test of indep. eqns. (rho = 0): chi2(1) = 0.21 Prob > chi2 = 0.6457
. heckman wage1 age age2 edu13 edu4 edu5 reg2 reg3 reg4
> , select(age age2 edu13 edu4 edu5 reg2 reg3 reg4 m_s1 job_hold) twostep nolog;
Heckman selection model -- two-step estimates Number of obs = 2094
(regression model with sample selection) Censored obs = 1833
Uncensored obs = 261
Wald chi2(8) = 77.71
Prob > chi2 = 0.0000
------------------------------------------------------------------------------
wage1 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
wage1 |
age | -.0364796 .0793241 -0.46 0.646 -.191952 .1189928
age2 | .0001722 .0007224 0.24 0.812 -.0012437 .0015881
edu13 | .5644776 .3613607 1.56 0.118 -.1437764 1.272732
edu4 | -.0199007 .215083 -0.09 0.926 -.4414556 .4016542
edu5 | .0559729 .2429543 0.23 0.818 -.4202087 .5321545
reg2 | -.4796038 .2147622 -2.23 0.026 -.9005299 -.0586776
reg3 | .5580965 .2076475 2.69 0.007 .1511148 .9650782
reg4 | -.4331516 .3103303 -1.40 0.163 -1.041388 .1750847
_cons | 8.310527 .9964926 8.34 0.000 6.357438 10.26362
-------------+----------------------------------------------------------------
select |
age | -.1470191 .0276835 -5.31 0.000 -.2012778 -.0927603
age2 | .0013161 .0003254 4.05 0.000 .0006784 .0019538
edu13 | .3523344 .2632156 1.34 0.181 -.1635588 .8682275
edu4 | .2549107 .1524679 1.67 0.095 -.0439209 .5537422
edu5 | .4061875 .1181928 3.44 0.001 .1745339 .6378411
reg2 | -.3442086 .1039805 -3.31 0.001 -.5480066 -.1404105
reg3 | -.2811355 .1316528 -2.14 0.033 -.5391702 -.0231009
reg4 | -.543206 .1233285 -4.40 0.000 -.7849253 -.3014866
m_s1 | .1963717 .1067202 1.84 0.066 -.0127961 .4055396
job_hold | .0330325 .0426152 0.78 0.438 -.0504917 .1165567
_cons | 2.355134 .6282926 3.75 0.000 1.123703 3.586565
-------------+----------------------------------------------------------------
mills |
lambda | .7462884 .6279665 1.19 0.235 -.4845034 1.97708
-------------+----------------------------------------------------------------
rho | 0.87902
sigma | .8489974
------------------------------------------------------------------------------
Question 1
If your outcome variable is integrated, you might consider using a single-equation generalized error correction model (GECM) as per Banerjee (1993) and De Boef (2001), as this model is agnostic to the stationarity of the predictors.
You might evaluate the stationarity of your outcome using:
$\log{(GDP/Labor)_{ti}} \sim \rho_{i}\log{(GDP/Labor)_{t-1i}} + \zeta_{ti} + \mu_{\rho_{i}}$,
where:
$\zeta_{ti}$ measures all disturbances to $\log{(GDP/Labor)_{ti}}$ in each time $t$ (assumed distributed normal), and
$\mu_{\rho_{i}}$ measures state-level variation in $\log{(GDP/Labor)_{ti}}$ (assumed distributed normal).
If $|\rho_{i}| \approx 1$, then you've got nearly integrated data, and the GECM, which also has the attractive properties of disentangling long-run effects, from both instantaneous change short term effects and from lagged short term effects.
The general form of the single equation GECM is:
$\Delta y_{t} = \beta_{0} + \beta_{c}\left[y_{t-1}-\left(\mathbf{X}_{t-1}\right)\right] + \mathbf{B}_{\Delta\mathbf{X}}\Delta\mathbf{X}_{t} + \mathbf{B}_{\mathbf{X}}\mathbf{X}_{t-1} + \varepsilon$,
where:
$\Delta$ is the first difference operator (e.g. $\Delta y_{t} = y_{t} - y_{t-1}$), and $\varepsilon$ may be decomposed into mixed effects (e.g. by including $\beta_{0i}$, for country-level random intercepts).
instantaneous short run effects are given by $\beta_{\Delta\mathbf{X}}$,
lagged short run effects are given by $\beta_{\mathbf{X}} - \beta_{c} - \beta_{\Delta\mathbf{X}}$, and
long run effects are given by $\left(\beta-{c}-\beta_{\mathbf{X}}\right)/\beta_{c}$.
This specification assumes a homogeneity of error correction processes. I haven't yet tried to derive a heterogeneous error correction specification...
In Stata you can perform Hadri's test for unit-root in panel data on the residuals of such a model, to check them for stationarity.
Question 2
I do not know that I can say much useful here.
Question 3
The time dummies can be included in the GECM model, and presumably other dynamic times series models, often they are used as indicators of, for example, policies going into effect. I have done something similar, but used (time-varying) proportions (rather than 0/1 indicator variables) to represent the portion of the time period during which a policy was in effect (e.g. some policies go into effect January 1, some July 1, some December 21, etc.). On the other hand: you don't have tons of data, so I suppose it depends how many new variables you are adding.
References:
Banerjee, A., Dolado, J. J., Galbraith, J. W., and Hendry, D. F. (1993). Co-integration, error correction, and the econometric analysis of non-stationary data. Oxford University Press, USA.
De Boef, S. (2001). Modeling equilibrium relationships: Error correction models with strongly autoregressive data. Political Analysis, 9(1):78–94.
Best Answer
You are right, fixed effect and first differencing are inconsistent with substantial downwards bias in small $T$.
The standard approach for a dynamic model and an unobserved fixed effect is to remove the fixed effect by first differencing and then finding instruments for the transformed regressors. All this assumes no serial correlation of the errors. If this is not the case, the parameters in your model are not identified and cannot be consistently estimated.
For your model, we get:
$s_{it} - s_{it-1} = \beta_0 s_{it-1} + \beta_1 ( s_{it-1}^2 + \text{ cross terms } ) + \epsilon_{it}-\epsilon_{it-1}$
As it is, both regressors must correlate with the error term. The only valid instruments will be $s_{it-2}$ or further back in time. Of course, instruments have to be good predictors of the regressors as well, otherwise you can have large biases ("weak instruments").
In theory you could use many of the valid lags as instruments (or in GMM terminology, moment conditions) as you want and there are ways of cleverly doing that using GMM estimation that do not make your $T$ smaller than it already is (1 observation is lost by first differencing alone)$.
References for these approaches would be the Arellano-Bond estimator and the Blundell-Bond estimator.