Solved – Using pooled OLS when running a model with panel data

fixed-effects-modelpanel datapooling

How bad is it to use pooled OLS instead of fixed effects when you have 7 years of panel data? From what I have understood, the risk is that the coefficients will be correlated with the error term, thus making the estimates biased. There will be some form of endogeneity.

Would it help if I include year dummies in the pooled OLS regression? It still wouldn’t capture the effects of varying intercept in the individual dimension, right?

One of my major explanatory variables is significant at the 5% level in FE regression. In the pooled OLS it is significant at the 0.001 level. Is this result negligible or could it still be used with the reservation that it is overestimated?

I ask this because most of the estimated parameters are strongly significant in the pooled OLS regression. Also, two of my explanatory variables that are constant get dropped in the FE regression. Although they are of secondary interest they contribute by explaining quite a lot of the variation in the dependent variable. (The sample is btw not congruent with a random effects model).

Is there some way to decide which model might be more suitable? If you know some things I should keep in mind when implementing the models I would be very grateful to hear them!

(I asked this question at another forum. I'll update if I get an answer.
http://www.talkstats.com/showthread.php/56320-Using-pooled-OLS-when-running-a-model-with-panel-data?p=159061#post159061 )

Best Answer

Adding the year dummies to your regression is unlikely to solve the problem of unobserved fixed effects. You should definitely include them both in OLS and fixed effects regressions to account for annual fluctuations in your dependent variable that were not due to any of your explanatory variables.
Only the fixed effects estimator (or first differencing) eliminate all unobserved fixed effects. As you said, also the observed time-invariant variables (like gender or similar) get dropped but this is not a problem unless you are interested in actually estimating their coefficients. Their contribution to the variation in the dependent variable just gets absorbed in the overall individual fixed effect. If you are interested in estimating the coefficient of these time-invariant variables, see here.

Note that panel data models need a correction of the standard errors for serial correlation (e.g. by clustering on the individual's ID variable). This might be the reason why your OLS standard errors are so small.

In order to decide whether you should use OLS or fixed effects you can use the Hausman test. The test compares the consistent but inefficient estimator (fixed effects) to the potentially inconsistent but efficient estimator (OLS). If the unobserved fixed effects do not bias your results, OLS and the fixed effects estimator should not differ significantly from each other. Then you can use OLS, otherwise you are better off with the fixed effects estimator because your point estimates will be correct but trading off some of the efficiency, i.e. the standard errors will be bigger.

Related Question