Solved – panel fixed effects wage equations

econometricspanel data

Can someone please explain fixed effects, fixed effects, cluster robust standard errors, random effects, and be for panel data wage equations and how to decide which is the most appropriate?

Best Answer

That's a very generic question that could be answered by any basic econometrics book. Suppose you have panel data and you want to regress earnings $y$ on some observable characteristics $X$ of an individual like age, birthplace, etc. The regression you would estimate is

$$y_{it} = \alpha + X'_{it} \beta + \epsilon_{it}$$

where the error term $\epsilon_{it} = c_i + \eta_{it}$, i.e. it's a function of individual heterogeneity $c_i$ which is not varying over time (hence not $t$ subscript) and some random shock $\eta_{it}$. In this context you may think of $c_i$ as individuals ability which is unobserved by the econometrician but potentially correlated with some of the observed individual characteristics.

Pooled ordinary least squares and random effects assume that the observable characteristics and the individual heterogeneity component are uncorrelated, $Cov(c_i,X_{it})=0$. If this does not hold then there is a correlation between your predictors and the error term which will bias your estimates - that's the standard omitted variables bias.

Fixed effects estimation uses the within transformation or first differencing to cancel out the unobserved individual fixed effects $c_i$. For two periods these two approaches will give identical results but it's not true for $T>2$. In the most basic version this is done by including a dummy for the $N-1$ individuals, so you're basically giving every person their own intercept which will capture $c_i$ and then $Cov(c_i,X_{it})\neq 0$ is not a problem anymore because

$$y_{it} = X'_{it} \beta + \sum^N_{i=1} \delta_i D_i + \eta_{it}$$

the individual fixed effects $c_i$ are directly estimated with every individual dummy $D_i$. Estimating this with dummies or using the within transformation is identical.

Fixed effects uses only the within variation in the data, that's the variation you see for every individual time series. Random effects instead also uses the variation between individuals and comes up with a matrix weighted average of the within and between variation that allows for a more efficient estimation, i.e. your standard errors are smaller because you exploit more information from the data.

The typical procedure to decide between random or fixed effects is to use the Hausman test. You know that fixed effects will give you consistent estimates regardless of whether $Cov(c_i,X_{it})\neq 0$ or not, but it is less efficient than random effects. Random effects will only give you consistent estimates if $Cov(c_i,X_{it})= 0$ is true. The Hausman test basically compares these two models and if random effects differs significantly from the fixed effects model you are rejecting that $Cov(c_i,X_{it})= 0$, in which case it is appropriate to use fixed effects. In the context of earnings regressions it is very likely that you will reject random effects for reasons of unobserved ability and similar.

Neither random effects nor fixed effects will be consistent and unbiased if $Cov(X_{it},\eta_{it})\neq 0$.

Regarding the standard errors it is common in this type of analysis to cluster them at the individual level. This corrects for autocorrelation. Wages today are typically highly correlated with past values and so you might expect that shocks for a person are correlated over time within the individual time series. It also corrects for heteroscedasticity. For instance, at higher levels of education you observe much more variance in earnings than at low levels of education. Clustering standard errors at the individual level takes care for both problems.