Solved – Panel regression with multiple fixed effects and heterogeneity

fixed-effects-modelheterogeneitypanel dataregression

For a research project I am supposed to estimate a panel regression model on a dataset with user data over observation time (the sample is assumed to represent general population). The supervisor is adamant about estimating a linear regression with fixed effects on variables such as gender, education level, month of year and so on.

As this is not a traditional approach to account for heterogeneity, which is as far as I know having a fixed effect on users (in this case on their ids) I do not know how to interpret such a strategy and how it accounts for heterogeneity and cannot acquire any more information from the supervisor.

Any help with understanding this issue would be appreciated 🙂

Best Answer

I agree with you that the most natural model to estimate is the two way fixed effects model

$$ Y_{it} = X_{it}'\beta + c_i + \theta_t + v_{it} $$

where $c_i$ is an individual fixed effect and $\theta_t$ is a time fixed effect. But, indeed, the effect of things like gender dummy variables that do not change over time are not estimable here because standard techniques (e.g. differencing or within transformations) to get rid of the individual fixed effect are also going to get rid of the time invariant variables as well.

Some ideas:

1) You could let the coefficients on time varying variables change over time: $$ Y_{it} = X_i'\beta_t + c_i + \theta_t + v_{it} $$

2) You can use a random effects approach $$ Y_{it} = X_i'\beta + \theta_t + v_{it} $$ but this will require the stronger assumption that the $X$ variables are uncorrelated with the time varying error term

3) You could use a correlated random effects approach, i.e. assume $E[c|X] = \bar{X}'\psi$ and perhaps get estimates this way

4) You could use a Hausman-Taylor approach and estimate the effect of a time invariant variable using an internal instrument

Edit

Based on your reply, I guess what you have in mind is either the first model that I wrote down or a model like $$ Y_{it} = X_{it}'\beta + W_i \delta + \theta_t + v_{it} $$ where $W$ is some time invariant observed characteristic (e.g. individual's race). Here's the difference between these two models. In the first one, $X_{it}$ can be correlated with any time invariant unobserved variable and you will still be able to consistently estimate $\beta$. In the second model, if any variables in $X$ or $W$ are correlated with any time invariant unobserved characteristics that are left out of the model then, in general, none of the parameters will be consistently estimated. This seems like a big disadvantage and something that you ought to think very carefully about in your particular application. The advantage of the second model though is that you do get an estimate of the variable $W$ on the outcome.

Related Solutions

Solved – Multiple Time Fixed Effects in Panel Regression

You can model year/time/week in various ways. First of all, I wonder if there really is an effect of month when adjusting for week? It depends on what you measure, obviously, but still, any effect that varies across the year should be taken care of by week.

In any case, I recommend using regression splines in a generalized additive model to do this. To take the correlation between repeated measurements within each city into account, you need to use a mixed model, so a generalized additive mixed model will be fine. Using R code:

library(mgcv)
M1 <- gamm(outcome ~ temperature + precipitation + s(year) + s(month) + s(week), random=~1|city)

And you can also try without month:

M2 <- gamm(outcome ~ temperature + precipitation + s(year) + s(week), random=~1|city)

You can now compare the models since the models are nested:

anova(M1$lme, M2$lme)

The second model is nested within the first, and a low p-value indicates that month should be kept, and a high p-value indicates that it should be dropped from the model.

Best Answer

Related Solutions

Solved – Multiple Time Fixed Effects in Panel Regression

Related Question