Regression Coefficients Comparison – How to Compare Coefficients from Two Separate Panel Regressions in Stata

fixed-effects-modelpanel dataregression coefficientsstata

I am trying to compare the coefficients of two panel data regressions with the same dependent variable. What I am aiming at is the following:

    y1 = c + β x
    y2 = c + β x

In Stata

    xtreg y1 x i.z
    xtreg y2 x i.z

I want to check whether the βs are significantly different.

With two regular regressions I would use something like the following code in Stata to test a cross-equation restriction:

    sureg (y1 x ) (y2 x )
    lincom [y1]x - [y2]x

However Stata is explaining that this is not possible when I try to use xtreg.

I have searched a lot of different sites. However I do not have a clue, it only made me more confused. Hopefully you guys can help.

edit

I am using a Fixed effect models

Best Answer

There's a user-written panel version of the random effects SUR estimator that you can obtain with ssc install xtsur. I am assuming you are using a RE estimator since that is the default with xtreg. The "add a constant" part is a bit of a hack, and I can't quite tell if it is in fact a bad idea.

Here's an toy example of what this would look like:

. webuse nlswork
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. gen constant=1

. xtsur (ln_wage constant age) (hours constant age)
(running multi-step estimates...)

Calculating multi-step estimates...
Iteration   1 : relative difference =  .00761817
Iteration   2 : relative difference =  6.278e-11


Seemingly unrelated regression (SUR) in panel data set

One-way random effect estimation:
------------------------------------------------------------------------------
Number of Group variable:   15                  Number of obs      =     28443
Panel variable: idcode                          Number of eqn      =         2
Time variable : year                            Number of panels   =        15

Random effects u_i ~ Gaussian
corr(u_i, e_it)    = 0 (assumed)
Panel type         : unbalanced

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ln_wage      |
    constant |   1.091836   .0125271    87.16   0.000     1.067283    1.116388
         age |   .0192946   .0002957    65.26   0.000     .0187151     .019874
-------------+----------------------------------------------------------------
hours        |
    constant |   37.04146   .2495206   148.45   0.000     36.55241    37.53052
         age |  -.0271416   .0071549    -3.79   0.000    -.0411649   -.0131183
-------------+----------------------------------------------------------------
     sigma_u |   see e(sigma_u)
     sigma_e |   see e(sigma_e)
------------------------------------------------------------------------------
Dependent variables:   ln_wage hours 
Independent variables: age 
------------------------------------------------------------------------------

. test [ln_wage]age=[hours]age

 ( 1)  [ln_wage]age - [hours]age = 0

           chi2(  1) =   42.51
         Prob > chi2 =    0.0000

lincom would also work here:

. lincom [ln_wage]age - [hours]age

 ( 1)  [ln_wage]age - [hours]age = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   .0464362   .0071221     6.52   0.000     .0324772    .0603951
------------------------------------------------------------------------------

The coefficients match the output of xtreg pretty closely in this case, though they won't be identical:

. xtreg ln_wage age, re

Random-effects GLS regression                   Number of obs      =     28510
Group variable: idcode                          Number of groups   =      4710

R-sq:  within  = 0.1026                         Obs per group: min =         1
       between = 0.0877                                        avg =       6.1
       overall = 0.0774                                        max =        15

                                                Wald chi2(1)       =   3140.35
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0185667   .0003313    56.04   0.000     .0179174    .0192161
       _cons |   1.120439   .0112038   100.01   0.000      1.09848    1.142398
-------------+----------------------------------------------------------------
     sigma_u |  .36972456
     sigma_e |  .30349389
         rho |  .59743613   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg hours age, re

Random-effects GLS regression                   Number of obs      =     28443
Group variable: idcode                          Number of groups   =      4709

R-sq:  within  = 0.0005                         Obs per group: min =         1
       between = 0.0007                                        avg =       6.0
       overall = 0.0002                                        max =        15

                                                Wald chi2(1)       =      7.81
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0052

------------------------------------------------------------------------------
       hours |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0240426   .0086031    -2.79   0.005    -.0409043   -.0071809
       _cons |   36.97867   .2717048   136.10   0.000     36.44613     37.5112
-------------+----------------------------------------------------------------
     sigma_u |  6.4129132
     sigma_e |  8.2312259
         rho |  .37771867   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Related Solutions

Solved – Time-invariant variables not being removed in Fixed Effects model. And feasibility of addional time dummies in Fixed Effect/Random modelling

Having an unbalanced panel is not a problem nowadays. In the past, when econometrics had to be done by hand, inverting matrices for unbalanced panels was more difficult but for computers this is not a problem. The only worry connected today with this is the question why the panel is unbalanced: is it due to attrition? If yes, is this attrition random or related to characteristics of the statistical units? For instance, in surveys people with higher education tend to be more responsive and stay in the panel longer for that reason.

Regarding the fixed effects model, have you checked whether the variables that are time-invariant in theory are actual not varying over time? Sometimes coding errors sneak in and then all the sudden a variable varies over time when it shouldn't. One way of checking this is to use the xtsum command which displays overall, between, and within summary statistics. The time-invariant variables should have a zero within standard deviation. If they don't then something went wrong in the coding.

Having a negative Hausman test statistics is a bad thing because the matrices that the test is built on are positive semi-definite and therefore the theoretical values of the test are positive. Negative values point towards model misspecification or a too small sample (related to this is this question).

If you cluster your standard errors you also need a modified version of the Hausman test. This is implemented in the xtoverid command. You can use it like this:

xtreg ln_r_prisperkg_Frst_102202 Dflere_mottak_tur i.landingsfylkekode i.kvartiler_ny markedsk_torsk gjenv_TAC_NØtorsk_år_prct lalder_fartøy i.fangstr r_minst_Frst_torsk gjenv_kvote_NØtorsk_fartøy_prct i.lengde_gruppering mobilitet, fe vce(cluster fartyid)
xtoverid

Rejecting the null rejects the validity of the assumptions underlying the random effects mode.

The xtset command only takes into account the unit id for fixed effects estimation. The time variable does not eliminate time fixed effects. So if you do

xtset id time
xtreg y x, fe

will give you the exact same results as

xtset id
xtreg y x, fe

The time variable is only specified for commands for which the sorting order of the data matters, for instance xtserial which tests for panel autocorrelation requires this. This has been discussed here. So if you want to include time fixed effects, you need to include the day dummies separately via i.day, for example. In this context, the season and year dummies make sense so it's good that you use them.

Stata – Differences in Differences, Fixed Effects, and Standard Errors Explained

You need to compare apples to apples, so use clustering with OLS and clustering with xtreg, fe (or robust with xtreg, fe, which will default to clustering as Thomas pointed out). These coefficient equivalences are limited to two-period (one pre, and one post) datasets with treatment at the same time for all treated units.

Here's an example of 2x2 DID on a public dataset demonstrating this. Here NJ restaurants are treated (become subject to the minimum wage increase) and PA restaurant are not. February '92 (t=0) is pre and November '92 is post (t=1). The DID parameter is the interaction of t = 1 and NJ = 1. The outcome fte is full-time equivalent employees. Here I will balance the panel in order to get xtreg, fe and OLS to give the same coefficient estimates. If the panel is unbalanced (consists of repeated cross-sections), xtreg, fe will drop some observations that appear in only one year and the estimates will no longer match OLS or manual calculations. You may want to stick with clustered OLS if you have a repeated cross-section.

Here is the result. Note that you can use factor variable notation to create the interactions rather than hard coding them.

. use http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta, clear
(Dataset from Card&Krueger (1994))

. drop if id == 407 // duplicate restaurant
(4 observations deleted)

. drop if missing(fte, treated, t, id)
(19 observations deleted)

. bysort id: keep if _N==2 // balance the panel
(19 observations deleted)

. xtset id t
       panel variable:  id (strongly balanced)
        time variable:  t, 0 to 1
                delta:  1 unit

. 
. /* calculate DID by hand */
. table treated t, c(mean fte N fte) row col

----------------------------------------
New       |
Jersey =  |
1;        |
Pennsylva | Feb. 1992 = 0; Nov. 1992 = 1
nia = 0   |        0         1     Total
----------+-----------------------------
       PA | 20.17333     17.65  18.91167
          |       75        75       150
          | 
       NJ | 17.06927  17.51831  17.29379
          |      314       314       628
          | 
    Total | 17.66774   17.5437  17.60572
          |      389       389       778
----------------------------------------

. di %9.3f (17.51831 - 17.06927) - (17.65 - 20.17333)
    2.972

. 
. /* fit models */
. eststo ols_robust:   qui   reg fte i.treated##i.t, robust

. eststo xtreg_robust: qui xtreg fte i.treated##i.t, fe robust

. eststo xtreg_clust:  qui xtreg fte i.treated##i.t, fe cluster(id)

. eststo ols_clust:    qui   reg fte i.treated##i.t, cluster(id)

. 
. capture ssc install estout

. esttab *, se(%9.7f) noomitted drop(0.treated 0.t 0.treated#0.t) modelwidt(15) mtitles label varwidth(35)

---------------------------------------------------------------------------------------------------------------
                                                (1)                (2)                (3)                (4)   
                                         ols_robust       xtreg_robust        xtreg_clust          ols_clust   
---------------------------------------------------------------------------------------------------------------
NJ                                           -3.104*                                                  -3.104*  
                                        (1.4475664)                                              (1.4484988)   

Feb. 1992 = 0; Nov. 1992 = 1=1               -2.523             -2.523*            -2.523*            -2.523*  
                                        (1.6371048)        (1.2498119)        (1.2498119)        (1.2506190)   

NJ # Feb. 1992 = 0; Nov. 1992 = 1=1           2.972              2.972*             2.972*             2.972*  
                                        (1.7822146)        (1.3337493)        (1.3337493)        (1.3346107)   

Constant                                      20.17***           17.67***           17.67***           20.17***
                                        (1.3591695)        (0.2232501)        (0.2232501)        (1.3600450)   
---------------------------------------------------------------------------------------------------------------
Observations                                    778                778                778                778   
---------------------------------------------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Clustering in DID settings is a good idea for reasons outlined in Bertrand, Duflo, and Mullainathan's 2004 QJE paper. Clustering at the level of treatment is also a good idea, but here that is not feasible since there are not enough clusters (since treatment is a state law and we have data from two states only) for that to work well. Generally your SEs will go up when you cluster in DID, but if the errors are negatively correlated within cluster, they might shrink. See this post for the reasons why.

Code:

estimates clear
cls
use http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta, clear
drop if id == 407 // duplicate restaurant
drop if missing(fte, treated, t, id)
bysort id: keep if _N==2 // balance the panel
xtset id t

/* calculate DID by hand */
table treated t, c(mean fte N fte) row col
di %9.3f (17.51831 - 17.06927) - (17.65 - 20.17333)

/* fit models */
eststo ols_robust:   qui   reg fte i.treated##i.t, robust
eststo xtreg_robust: qui xtreg fte i.treated##i.t, fe robust
eststo xtreg_clust:  qui xtreg fte i.treated##i.t, fe cluster(id)
eststo ols_clust:    qui   reg fte i.treated##i.t, cluster(id)

capture ssc install estout
esttab *, se(%9.7f) noomitted drop(0.treated 0.t 0.treated#0.t) modelwidt(15) mtitles label varwidth(35)

edit

Best Answer

Related Solutions

Solved – Time-invariant variables not being removed in Fixed Effects model. And feasibility of addional time dummies in Fixed Effect/Random modelling

Stata – Differences in Differences, Fixed Effects, and Standard Errors Explained

Related Question