Solved – Panel regression with multiple fixed effects and heterogeneity

fixed-effects-modelheterogeneitypanel dataregression

For a research project I am supposed to estimate a panel regression model on a dataset with user data over observation time (the sample is assumed to represent general population). The supervisor is adamant about estimating a linear regression with fixed effects on variables such as gender, education level, month of year and so on.

As this is not a traditional approach to account for heterogeneity, which is as far as I know having a fixed effect on users (in this case on their ids) I do not know how to interpret such a strategy and how it accounts for heterogeneity and cannot acquire any more information from the supervisor.

Any help with understanding this issue would be appreciated 🙂

Best Answer

I agree with you that the most natural model to estimate is the two way fixed effects model

$$ Y_{it} = X_{it}'\beta + c_i + \theta_t + v_{it} $$

where $c_i$ is an individual fixed effect and $\theta_t$ is a time fixed effect. But, indeed, the effect of things like gender dummy variables that do not change over time are not estimable here because standard techniques (e.g. differencing or within transformations) to get rid of the individual fixed effect are also going to get rid of the time invariant variables as well.

Some ideas:

1) You could let the coefficients on time varying variables change over time: $$ Y_{it} = X_i'\beta_t + c_i + \theta_t + v_{it} $$

2) You can use a random effects approach $$ Y_{it} = X_i'\beta + \theta_t + v_{it} $$ but this will require the stronger assumption that the $X$ variables are uncorrelated with the time varying error term

3) You could use a correlated random effects approach, i.e. assume $E[c|X] = \bar{X}'\psi$ and perhaps get estimates this way

4) You could use a Hausman-Taylor approach and estimate the effect of a time invariant variable using an internal instrument

Edit

Based on your reply, I guess what you have in mind is either the first model that I wrote down or a model like $$ Y_{it} = X_{it}'\beta + W_i \delta + \theta_t + v_{it} $$ where $W$ is some time invariant observed characteristic (e.g. individual's race). Here's the difference between these two models. In the first one, $X_{it}$ can be correlated with any time invariant unobserved variable and you will still be able to consistently estimate $\beta$. In the second model, if any variables in $X$ or $W$ are correlated with any time invariant unobserved characteristics that are left out of the model then, in general, none of the parameters will be consistently estimated. This seems like a big disadvantage and something that you ought to think very carefully about in your particular application. The advantage of the second model though is that you do get an estimate of the variable $W$ on the outcome.

Related Question