Solved – Account for both between and within variance in panel data

logisticmixed modelpanel dataregression

I have a general question on fixed-effects and mixed-effects models for panel data. I am doing a logistic regression on panel data, with data measured on the individual level.

I understand that by using fixed-effects in my model (xtlogit...,fe in Stata) I basically ignore or do not look at the difference between individuals, but rather use the within variance of the IVs to calculate the coefficients.

Question: But what if there's also considerable difference between the individuals that I would like to capture? Is it even possible to account for both the unobserved heterogeneity (through fixed-effects) and the variance between individuals?

I read a lot about Random Effects, Population-averaged models, between-estimator and mixed-effects but I can't bridge my intellectual gap on the question "when to use which model?".

I feel that the mixed-effects logistic regression (xtmelogit in Stata) might apply, but the term random intercept got me thinking if that really captures the variance between individuals in panel data. Also my data is not nested as in most of the examples on mixed-effects models which speak of clusters or group levels (e.g. students in schools).

Best Answer

You can account for certain unobserved heterogeneity in panel, called correlated random effects, if you are willing to make certain assumptions about the correlation of the unobserved heterogeneity with the observed regressors.

Let us say $y_{it}$ is your outcome of interest (perhaps a binary variable), $X_{it}$ are observable individual characteristics, $\gamma_{i}$ is a time-invariant unobserved individual effect and $u_{it}$ are independent errors (possibly correlated over time) and you are interested in (I do not use here any non-identical link function for a possible binary outcome for simplicity reasons)

$y_{it} = \beta_0 + \beta_1 X_{it} + \gamma_i + u_{it}$

Keep it mind that fixed effects allows for an arbitrary correlation between unobserved time-invariant indidivual heterogeneity and other characteristics $Cov(\gamma_i,X_{it})$ while in the random-effects world there must not be any correlation $Cov(\gamma_i,X_{it})=0$. If you know that the unobserved heterogeneity in your panel data depends on observed characteristics in a certain way you can model it. A famous example is by using the individual mean of the observed variables over time, see Mundlak (1978):

$\gamma_i = \alpha_0 + \alpha_1 \bar{X}_i + \epsilon_i$

But it is very crucial that your assumptions about the specific dependency hold. A more general has been introduced by Chamberlain (1982).

Imbens and Wooldridge talk about these methods in their lecture series:

http://www.nber.org/WNE/lect_2_linpanel.pdf

And here is another source discussing these topics:

http://www.u.arizona.edu/~hirano/696_2010/ln10.pdf