Can someone please explain fixed effects, fixed effects, cluster robust standard errors, random effects, and be for panel data wage equations and how to decide which is the most appropriate?
Solved – panel fixed effects wage equations
econometricspanel data
Related Solutions
You can and should use a well-specified random effects model. Always.
The Hausman test is said to suggest fixed effects models, but can and should be viewed "as a standard Wald test for the omission of the variables $\widetilde{\mathbf{X}}$" (Baltagi 2008, §4.3), where $\widetilde{\mathbf{X}}$ is a matrix of deviations from group means. If you do not omit $\widetilde{\mathbf{X}}$, a random effects model gives you the same population (fixed) effects as a fixed effects model, and the individual effects.
Mundlak (1978) argues that there is a unique estimator for the model $$\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}\boldsymbol{\alpha}+\mathbf{u}\qquad\qquad \mathbf{Z}=\mathbf{I}_{N}\otimes\mathbf{e}_T$$ where $\mathbf{I}_{N}$ is an identity matrix, $\otimes$ denotes Kronecker product, $\mathbf{e}_T$ is a vector of ones, so $\mathbf{Z}$ is the matrix of individual dummies, and $\boldsymbol{\alpha}=(\alpha_1,\dots,\alpha_N)$.
If $\alpha_i=\overline{\mathbf{X}}_{i*}\boldsymbol{\pi}+w_{i}$, $\boldsymbol{\pi}\ne\mathbf{0}$, averaging over $t$ for a given $i$, the model can be written as $$\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{P}(\mathbf{X}\boldsymbol{\pi}+\mathbf{w})+\mathbf{u}\qquad\qquad \mathbf{P}=\mathbf{I}_N\otimes\bar{\mathbf{J}}_T$$ where $\mathbf{P}$ is a matrix which averages the observations across time for each individual (Baltagi 2008, §2.1). Under the fixed effects model, the within estimator is $$\hat{\boldsymbol{\beta}}_{w}=(\mathbf{X'QX})^{-1}\mathbf{X'Qy}\tag{1}$$ where $\mathbf{Q}=\mathbf{I}-\mathbf{P}$ is a matrix which obtains the deviations from individual means. Mundlak argues that under the random effects model, to get the same estimates the estimator should be $$\begin{bmatrix} \hat{\boldsymbol{\beta}} \\ \hat{\boldsymbol{\pi}}\end{bmatrix}= \left(\begin{bmatrix}\mathbf{X}' \\ \mathbf{X'P}\end{bmatrix}\boldsymbol{\Sigma}^{-1}\begin{bmatrix}\mathbf{X}&\mathbf{XP} \end{bmatrix}\right)^{-1}\begin{bmatrix}\mathbf{X}' \\ \mathbf{X'P} \end{bmatrix}\boldsymbol{\Sigma}^{-1}\mathbf{y}\tag{2}$$ where $\boldsymbol{\Sigma}^{-1}$ is the variance of the error term, while the "usual" estimator (the so-called "Balestra-Nerlove estimator") is $$\hat{\boldsymbol{\beta}}=(\mathbf{X}'\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\Sigma}^{-1}\mathbf{y}$$ which is biased. According to Mundlak, since $(1)$ and $(2)$ obtain the same estimates for $\boldsymbol{\beta}$, $(2)$ is the within estimator, i.e. $(1)$ is the unique estimator and does not depend on the knowledge of the variance components.
However, the models $$\begin{align} \mathbf{y}&=\mathbf{X}\boldsymbol{\beta}+\mathbf{P}(\mathbf{X}\boldsymbol{\pi}+\mathbf{w})+\mathbf{u}\tag{FE} \\ \mathbf{y}&=\mathbf{X}\boldsymbol{\beta}+\mathbf{P}\mathbf{X}\boldsymbol{\pi}+(\mathbf{Pw}+\mathbf{u})\tag{RE} \end{align}$$ are formally equivalent (Hsiao 2003, §4.3), so a random effects model obtains the same estimates ... as long as you do not omit $\widetilde{\mathbf{X}}$! Let's try.
Data generation (R code):
set.seed(1234)
N <- 25 # individuals
T <- 5 # time
In <- diag(N) # identity matrix of order N
Int <- diag(N*T) # identity matrix of order N*T
Jt <- matrix(1, T, T) # matrix of ones of order T
Jtm <- Jt / T
P <- kronecker(In, Jtm) # averages the obs across time for each individual
s2a <- 0.3 # sigma^2_\alpha
s2u <- 0.6 # sigma^2_u
w <- rep(rnorm(N, 0, sqrt(s2a)), each = T)
u <- rnorm(N*T, 0, sqrt(s2u))
b <- c(1.5, -2)
p <- c(-0.7, 0.8)
X <- cbind(runif(N*T, 2, 5), runif(N*T, 4, 8))
XPX <- cbind(X, P %*% X) # [ X PX ]
y <- XPX %*% c(b,p) + (P %*% w + u) # y = Xb + PXp + Pw + u
ds <- data.frame(id=rep(1:N, each=T), wave=rep(1:T, N), y, split(X, col(X)))
Under a fixed effects model we get:
> fe.1 <- plm(y ~ X1 + X2, data=ds, model="within")
> summary(fe.1)$coefficients
Estimate Std. Error t-value Pr(>|t|)
X1 1.435987 0.07825464 18.35019 1.806239e-33
X2 -1.916447 0.06339342 -30.23100 1.757634e-51
while under a random effects model...
> re.1 <- plm(y ~ X1 + X2, data=ds, model="random")
> summary(re.1)$coefficients
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 1.830633 0.51687109 3.541759 5.638216e-04
X1 1.405060 0.07927271 17.724390 1.505521e-35
X2 -1.874784 0.06372731 -29.418846 3.076414e-57
bias!
But what if we do not omit $\widetilde{\mathbf{X}}=\mathbf{QX}$?
> Q <- diag(N*T) - P
> X1.mean <- P %*% ds$X1
> X1.dev <- Q %*% ds$X1
> X2.mean <- P %*% ds$X2
> X2.dev <- Q %*% ds$X2
> re.2 <- plm(y ~ X1.mean + X1.dev + X2.mean + X2.dev, data=ds, model="random")
> summary(re.2)$coefficients
Estimate Std. Error t-value Pr(>|t|)
(Intercept) -0.04123108 2.30907450 -0.01785611 9.857833e-01
X1.mean 0.81279279 0.38146339 2.13072292 3.515287e-02
X1.dev 1.43598746 0.07824535 18.35236883 1.239171e-36
X2.mean -1.23071499 0.26379329 -4.66545216 8.072196e-06
X2.dev -1.91644653 0.06338590 -30.23458903 5.809240e-58
The estimates for X1.dev
and X2.dev
are equal to the within estimates for X1
and X2
(no room for Hausman tests!), and you get much more. You get what you need.
However this is just the tip of the iceberg. I recommend that you read at least Bafumi and Gelman (2006), Snijders and Berkhof (2008), Bell and Jones (2014).
References
Baltagi, Badi H. (2008), Econometric Analysis of Panel Data, John Wiley & Sons
Bafumi, Joseph and Andrew Gelman (2006), Fitting Multilevel Models When Predictors and Group Effects Correlate, http://www.stat.columbia.edu/~gelman/research/unpublished/Bafumi_Gelman_Midwest06.pdf
Bell, Andrew and Kelvyn Jones (2014), "Explaining Fixed Effects: Random Effects modelling of Time-Series Cross-Sectional and Panel Data", Political Science Research and Methods, http://dx.doi.org/10.7910/DVN/23415
Hsiao, Cheng (2003), Analysis of Panel Data, Cambridge University Press
Mundlak, Yair (1978), "On the Pooling of Time Series and Cross Section Data", Econometrica, 43(1), 44-56
Sniiders, Tom A. B. and Johannes Berkhof (2008), "Diagnostic Checks for Multilevel Models", in: Jan de Leeuw and Erik Meijer (eds), Handbook of Multilevel Analysis, Springer, Chap. 3
A couple of different things your post brings up (hopefully you recognize that).
The first relates to deciding random vs fixed effects. In my experience deciding between fixed and random effects has two pieces:
Statistical fit. Assessed using things like a Hausman test, standard fare in most packages like Stata, SAS, R, etc. This will tell you if a random intercept "works" better with your data than a fixed effect.
Theoretical fit. How are you conceptualizing the effect? Is it truly a fixed, unchanging entity, not coming from a theoretical distribution? For example, I have rarely seen States treated like random effects - there are 50 and only 50 states (or 51 +DC, or more if you add territories), and they never change being the same states. Same thing with years when there are a few years in the panel, those are often treated as fixed, because you want to capture a common shock to all observations in that year and quantify it as a fixed effect. Other things, however, are not so clear. I'm doing an analysis of counties - of course you could treat counties as fixed effects, but there are over 3,000, and I don't think anyone would really want to be so focused on a single county. So I'm treating them as a random effect, coming from a distribution. When doing repeated measures, again, the individual is treated as random (representative hopefully of a larger population that has parameters you estimate).
The second issue you bring up is intercept vs slope. In this regard random effects and fixed effects are not comparable. A fixed effect literally just adjusts the intercept for each fixed effect - it captures the mean relationship between the given effect and the outcome variable. What that results in, is the slopes you have are within group effects, because you've already captured the variance attributable to the fixed effects. If you think think that the slope is different for each effect, the interpretation is that the effects moderate the effect of the given variable whose slope you are interested in:
$y=\alpha+\beta x + \Sigma{Z_i\theta_i}$
Where X is your covariate of interest, and Z is your vector of each fixed effect i and $\theta_i$ is the effect. Now, if you are thinking the slope is going to differ for each, you actually create an interaction term or moderator for each effect (which explodes the number of coefficients and cuts your degrees of freedom):
$y=\alpha+\beta x + \Sigma{Z_i\theta_i}+\Sigma{Z_ix\gamma_i}$
$\gamma$ is your vector of coefficients, so to get the slope for any given effect you need to add up the $\beta$ with the relevant $\gamma$. So what does $\beta$ mean here? It is not the average slope across all the effects, as it is in a random effect model. It is the slope for the omitted effect.
So, to get back to your original question: - Start with the theoretical definition of your effects. Are they truly fixed, or do they reflect a random distribution, that has population parameters you can estimate? Answer that and then I believe you should be on your way to figuring out the next step. - If you believe that the effects should indeed be random, then do your statistical tests. Test if random effects fit better. Then do your appropriate tests to see if a random coefficient is appropriate.
Best Answer
That's a very generic question that could be answered by any basic econometrics book. Suppose you have panel data and you want to regress earnings $y$ on some observable characteristics $X$ of an individual like age, birthplace, etc. The regression you would estimate is
$$y_{it} = \alpha + X'_{it} \beta + \epsilon_{it}$$
where the error term $\epsilon_{it} = c_i + \eta_{it}$, i.e. it's a function of individual heterogeneity $c_i$ which is not varying over time (hence not $t$ subscript) and some random shock $\eta_{it}$. In this context you may think of $c_i$ as individuals ability which is unobserved by the econometrician but potentially correlated with some of the observed individual characteristics.
Pooled ordinary least squares and random effects assume that the observable characteristics and the individual heterogeneity component are uncorrelated, $Cov(c_i,X_{it})=0$. If this does not hold then there is a correlation between your predictors and the error term which will bias your estimates - that's the standard omitted variables bias.
Fixed effects estimation uses the within transformation or first differencing to cancel out the unobserved individual fixed effects $c_i$. For two periods these two approaches will give identical results but it's not true for $T>2$. In the most basic version this is done by including a dummy for the $N-1$ individuals, so you're basically giving every person their own intercept which will capture $c_i$ and then $Cov(c_i,X_{it})\neq 0$ is not a problem anymore because
$$y_{it} = X'_{it} \beta + \sum^N_{i=1} \delta_i D_i + \eta_{it}$$
the individual fixed effects $c_i$ are directly estimated with every individual dummy $D_i$. Estimating this with dummies or using the within transformation is identical.
Fixed effects uses only the within variation in the data, that's the variation you see for every individual time series. Random effects instead also uses the variation between individuals and comes up with a matrix weighted average of the within and between variation that allows for a more efficient estimation, i.e. your standard errors are smaller because you exploit more information from the data.
The typical procedure to decide between random or fixed effects is to use the Hausman test. You know that fixed effects will give you consistent estimates regardless of whether $Cov(c_i,X_{it})\neq 0$ or not, but it is less efficient than random effects. Random effects will only give you consistent estimates if $Cov(c_i,X_{it})= 0$ is true. The Hausman test basically compares these two models and if random effects differs significantly from the fixed effects model you are rejecting that $Cov(c_i,X_{it})= 0$, in which case it is appropriate to use fixed effects. In the context of earnings regressions it is very likely that you will reject random effects for reasons of unobserved ability and similar.
Neither random effects nor fixed effects will be consistent and unbiased if $Cov(X_{it},\eta_{it})\neq 0$.
Regarding the standard errors it is common in this type of analysis to cluster them at the individual level. This corrects for autocorrelation. Wages today are typically highly correlated with past values and so you might expect that shocks for a person are correlated over time within the individual time series. It also corrects for heteroscedasticity. For instance, at higher levels of education you observe much more variance in earnings than at low levels of education. Clustering standard errors at the individual level takes care for both problems.