I have a large panel data set of global trade flows and I would like to create a model for forecasting between two specific country pairs. I am just having trouble figuring out how to go about it in Stata using fixed effects and yet still accounting for the individual effects.
I found this research paper and would essentially like to recreate the process explained on p.299: http://ageconsearch.umn.edu/bitstream/43996/2/martinez.pdf
A problem we faced with FEM is that we cannot directly estimate
variables that do not change over time because the inherent
transformation wipes out such variables. However, these variables can
be easily estimated in a second step, running another regression with
the individual effects as the dependent variable and distance and
dummies as explanatory variables
I can easily estimate xtreg
with trade volume regressed on gdp/gdp per capita, but what are the steps I then take to estimate the individual effects (dist, island, etc) in Stata?
Note: I assume they are using FE OLS, but because my data contains many 0 trade flows, is it possible to apply this method to an xtpoisson
instead?
Best Answer
What they do in the paper is that they estimate their gravity model, say equation 5.2, using the fixed effects estimator and they estimate the fixed effects directly to use them later in equation 6. You can do this with the
predict
command afterxtreg
. In Stata this would be:In the fixed effects regression all the time-invariant variables drop out as the authors stated. The
predict
command then gives you the individual effects $\text{IE}$ which they use in equation 6.With regards to your note I'm not sure if the same procedure applies to
xtpoisson
given that the interpretation of the estimated fixed effects changes. For this have a look at a similar question on the Statalist with the corresponding answer by Maarten Buis. He is also active on CV so if you're lucky he can provide you with guidance on this. Otherwise I would guess that Martinez-Zarzoso and Nowak-Lehmann had the same problem with the many zeros (I suppose their data is similar to yours given the similarity of the application) and yet the had their reasons to stick to linear models.I hope this helps.