Solved – Specification of panel data

econometricsfixed-effects-modelpanel datastata

I am trying to find out the best specification for my dataset.

I am trying to probe the effectiveness of the special economic zones in Poland in the meaning of growth of the economy in three similar panel data models for explained variables: a) registered unemployment rate b) GDP per capita c) gross fixed capital formation per capita. The data is for NUTS3 sub-regions. The explanatory variables are: 0-1 for presence of the SEZ in sub-region in year $t$ and a few of the economic variables; yearly frequency, dataset is 2004-2012 for 66 sub-regions.

I have tried fixed and random effects. As for now, I have chosen FE, because of significance and theoretically correct signs. But there are some issues that prevent me from taking it for granted:

How to test for autocorrelation and cross-correlation?
I have no idea how to test the error term's distribution in Stata, and furthermore if it is not normally distributed, should I care about it much?
As I understand from the literature, values of the correlation coefficient between explanatory variables and the error term near -1 or 1 are not bad as a matter of fact; in my case, it's nearly -1 as you can see.
Is a mixed model appropriate for my dataset?

I attach the outcome for the model explaining unemployment rate.

Code:

xtreg  st_bezr sse01 wartosc_sr_trw_per_capita zatr_przem_bud podm_gosp_na_10tys_ludn proc_ludn_wiek_prod ludnosc_na_km2, fe

Fixed-effects (within) regression               Number of obs      =       594
Group variable: id                              Number of groups   =        66

R-sq:  within  = 0.4427                         Obs per group: min =         9
       between = 0.3479                                        avg =       9.0
       overall = 0.2365                                        max =         9

                                                F(6,522)           =     69.10
corr(u_i, Xb)  = -0.9961                        Prob > F           =    0.0000

-------------------------------------------------------------------------------------------
                  st_bezr |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
                    sse01 |  -1.406066   .4631984    -3.04   0.003    -2.316028   -.4961045
wartosc_sr_trw_per_capita |  -.0000963   .0000166    -5.79   0.000    -.0001289   -.0000636
           zatr_przem_bud |  -26.11989   4.992198    -5.23   0.000    -35.92716   -16.31263
  podm_gosp_na_10tys_ludn |  -.0201788   .0030788    -6.55   0.000    -.0262273   -.0141304
      proc_ludn_wiek_prod |  -229.1996   16.92631   -13.54   0.000    -262.4516   -195.9475
           ludnosc_na_km2 |   .0790167   .0120865     6.54   0.000     .0552726    .1027609
                    _cons |   161.9786   10.76989    15.04   0.000      140.821    183.1363
--------------------------+----------------------------------------------------------------
                  sigma_u |  53.986519
                  sigma_e |  2.5446248
                      rho |  .99778327   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------------
F test that all u_i=0:     F(65, 522) =    27.09             Prob > F = 0.0000

Best Answer

For the Stata commands in this answer let me collect your variables in a local:
local xlist sse01 wartosc_sr_trw_per_capita zatr_przem_bud podm_gosp_na_10tys_ludn proc_ludn_wiek_prod ludnosc_na_km2
So now you can always call all the variables with `xlist'

1) There are two commands that you can use after your fixed effects regression.

xttest2 performs a Breusch-Pagan LM test with the null hypothesis of no dependence between the residuals. This is a test for contemporaneous correlation. Not rejecting the null means that the test did not detect any cross-sectional dependence in your residuals.
xttest3 performs a modified version of the Wald test for groupwise heteroscedasticity. The null hypothesis is homoscedasticity.

You can install both commands by typing ssc instal xttest2 and ssc instal xttest3. If you detect correlations between your residuals you can correct for this with the robust option:
xtreg st_bezr 'xlist', fe robust

To test for autocorrelation you can apply a Lagrange Multiplier test via xtserial:
xtserial st_bezr 'xlist'
The null hypothesis is no serial correlation. To correct for both serial correlation and heteroscedasticity you can use the cluster option with your id variable:
xtreg st_bezr 'xlist', fe cluster(id)

2) For the normality test for the residuals: you can obtain the residuals via the predict command predict res, e after your fixed effects regression. For visual inspection you can use:

kdensity res, normal (plots the distribution of the residuals and compares it to a normal)
pnorm res (plots a standardized normal probability plot)
qnorm res (plots the quantiles of the residuals against the quantiles of a normal distribution)

With pnorm you can see if there is non-normality in the middle of the distribution and qnorm shows you any non-normality in the tails. A formal test can be obtained by swilk res. The null hypothesis is that the residuals are normally distributed. Generally, non-normality is not a too big concern but it matters for inference. You can again correct for this with the robust option.

3) Having corr(u_i, Xb) = -0.9961 means that the fixed effects are strongly correlated with your explanatory variables, so you did well by controlling for these fixed effects. A strong correlation of this type usually indicates that pooled OLS or random effects will not be suitable for your purpose because both of these models assume that the correlation between $u_i$ and $X\beta$ is zero.

4) Generally yes but it depends what you want to estimate or how you can treat your data, i.e. whether your variables are random variables or not. Here is an excellent explanation for the difference between mixed effects and panel data models by @mpiktas which will surely help you.

Related Solutions

Solved – Dynamic Panel models, GMM, Stata

Since you have the long-narrow panel data ($T > N$), the easiest way is to fit OLS for all 6 countries separately. If you suspect there are across dependence between $N$, you can run sureg (seemingly unrelated regression) in Stata to control cross relationships in OLS residuals.

Solved – Fixed Effects Gravity Model for forecasting..with time constant variables Stata

What they do in the paper is that they estimate their gravity model, say equation 5.2, using the fixed effects estimator and they estimate the fixed effects directly to use them later in equation 6. You can do this with the predict command after xtreg. In Stata this would be:

xtreg IX lYi lYj lNi lNj lD  lIi lIj Pij1 - Pijh
predict IE, u

In the fixed effects regression all the time-invariant variables drop out as the authors stated. The predict command then gives you the individual effects $\text{IE}$ which they use in equation 6.

With regards to your note I'm not sure if the same procedure applies to xtpoisson given that the interpretation of the estimated fixed effects changes. For this have a look at a similar question on the Statalist with the corresponding answer by Maarten Buis. He is also active on CV so if you're lucky he can provide you with guidance on this. Otherwise I would guess that Martinez-Zarzoso and Nowak-Lehmann had the same problem with the many zeros (I suppose their data is similar to yours given the similarity of the application) and yet the had their reasons to stick to linear models.
I hope this helps.

Best Answer

Related Solutions

Solved – Dynamic Panel models, GMM, Stata

Solved – Fixed Effects Gravity Model for forecasting..with time constant variables Stata

Related Question