Solved – When is a fixed effect truly fixed

fixed-effects-modelphilosophical

Consider a linear unobserved effects model of the type:
$$y_{it} = X_{it}\beta + c_{i} + e_{it}$$
where $c$ is an unobserved but time-invariant characteristic and $e$ is an error, $i$ and $t$ index individual observations and time, respectively. The typical approach in a fixed effects (FE) regression would be to remove $c_{i}$ via individual dummies (LSDV) / de-meaning or by first differencing.

What I have always wondered: when is $c_{i}$ truly "fixed"?

This might appear a trivial question but let me give you two examples for my reason behind it.

Suppose we interview a person today and ask for her income, weight, etc. so we get our $X$. For the next 10 days we go to that same person and interview her again every day anew, so we have panel data for her. Should we treat unobserved characteristics as fixed for this period of 10 days when surely they will change at some other point in the future? In 10 days her personal ability might not change but it will when she gets older. Or asked in a more extreme way: if I interview this person every hour for 10 hours in a day, her unobserved characteristics are likely to be fixed in this "sample" but how useful is this?
Now suppose we instead interview a person every month from the start to the end of her life for 85 years or so. What will remain fixed in this time? Place of birth, gender and eye color most likely but apart from that I can hardly think of anything else. But even more importantly: what if there is a characteristic which changes at one single point in her life but the change is infinitesimally small? Then it's not a fixed effect anymore because it changed when in practice this characteristic is quasi fixed.

From a statistical point it is relatively clear what is a fixed effect but from an intuitive point this is something I find hard to make sense of. Maybe someone else had these thoughts before and came up with an argument about when a fixed effect is really a fixed effect. I would very much appreciate other thoughts on this topic.

Best Answer

If you are interested in this formulation for causal inference about $\beta$ then the unknown quantities represented by $c_i$ need only be stable for the duration of the study / data for fixed effects to identify the relevant causal quantity.

If you are concerned that the quantities represented by $c_i$ aren't stable even over this period then fixed effects won't do what you want. Then you can use random effects instead, although if you expect correlation between random $c_i$ and $X_i$ you'd want to condition $c_i$ on $\bar{X}_i$ in a multilevel setup. Concern about this correlation is often one of the motivations for a fixed effects formulation because under many (but not all) circumstances you don't need to worry about it then.

In short, your concern about variation in the quantities represented by $c_i$ is very reasonable, but mostly as it affects the data for the period you have rather than periods you might have had or that you may eventually have but don't.

Related Solutions

Solved – Time-invariant variables not being removed in Fixed Effects model. And feasibility of addional time dummies in Fixed Effect/Random modelling

Having an unbalanced panel is not a problem nowadays. In the past, when econometrics had to be done by hand, inverting matrices for unbalanced panels was more difficult but for computers this is not a problem. The only worry connected today with this is the question why the panel is unbalanced: is it due to attrition? If yes, is this attrition random or related to characteristics of the statistical units? For instance, in surveys people with higher education tend to be more responsive and stay in the panel longer for that reason.

Regarding the fixed effects model, have you checked whether the variables that are time-invariant in theory are actual not varying over time? Sometimes coding errors sneak in and then all the sudden a variable varies over time when it shouldn't. One way of checking this is to use the xtsum command which displays overall, between, and within summary statistics. The time-invariant variables should have a zero within standard deviation. If they don't then something went wrong in the coding.

Having a negative Hausman test statistics is a bad thing because the matrices that the test is built on are positive semi-definite and therefore the theoretical values of the test are positive. Negative values point towards model misspecification or a too small sample (related to this is this question).

If you cluster your standard errors you also need a modified version of the Hausman test. This is implemented in the xtoverid command. You can use it like this:

xtreg ln_r_prisperkg_Frst_102202 Dflere_mottak_tur i.landingsfylkekode i.kvartiler_ny markedsk_torsk gjenv_TAC_NØtorsk_år_prct lalder_fartøy i.fangstr r_minst_Frst_torsk gjenv_kvote_NØtorsk_fartøy_prct i.lengde_gruppering mobilitet, fe vce(cluster fartyid)
xtoverid

Rejecting the null rejects the validity of the assumptions underlying the random effects mode.

The xtset command only takes into account the unit id for fixed effects estimation. The time variable does not eliminate time fixed effects. So if you do

xtset id time
xtreg y x, fe

will give you the exact same results as

xtset id
xtreg y x, fe

The time variable is only specified for commands for which the sorting order of the data matters, for instance xtserial which tests for panel autocorrelation requires this. This has been discussed here. So if you want to include time fixed effects, you need to include the day dummies separately via i.day, for example. In this context, the season and year dummies make sense so it's good that you use them.

Fixed Effects vs Clustered Standard Errors – When to Use Each in Econometrics

Both approaches, using group fixed effects and/or cluster-adjusted standard error take into account different issues related to clustered (or panel) data and I would clearly view them as distinct approaches. Often you want to use both of them:

First of all, cluster-adjusted standard error account for within-cluster correlation or heteroscedasticity which the fixed-effects estimator does not take into account unless you are willing to make further assumptions, see the Imbens and Wooldridge lecture slides for an good discussion of short and long panels and various issues related to this problem. There is also a novel paper about this topic by Cameron and Miller: A Practitioner's Guide to Cluster-Robust Inference which might be interesting for you. If you do not want to model the variance-covariance matrix and you suspect that within-cluster correlation is present, I advise to use cluster robust standard error because the bias in your SE may be severe (much more problematic than for heteroscedasticity, see Angrist & Pischke Chapter III.8 for a discussion of this topic. But you need enough cluster (Angrist and Pischke say 40-50 as a role of thumb). Cluster-adjusted standard error take into account standard error but leave your point estimates unchanged (standard error will usually go up)!

Fixed-effects estimation takes into account unobserved time-invariant heterogeneity (as you mentioned). This can be good or bad: On the hand, you need less assumptions to get consistent estimations. On the other hand, you throw away a lot of variance which might be useful. Some people like Andrew Gelman prefer hierarchical modeling to fixed effects but here opinions differ. Fixed-effects estimation will change both, point and interval estimates (also here standard error will usually be higher).

So to sum up: Cluster-robust standard error are an easy way to account for possible issues related to clustered data if you do not want to bother with modeling inter- and intra-cluster correlation (and there are enough clusters available). Fixed-effects estimation will take use only certain variation, so it depends on your model whether you want to make estimates based on less variation or not. But without further assumptions fixed-effects estimation will not take care of the problems related to intra-cluster correlation for the variance matrix. Neither will cluster-robust standard error take into account problems related to the use of fixed-effects estimation.

Best Answer

Related Solutions

Solved – Time-invariant variables not being removed in Fixed Effects model. And feasibility of addional time dummies in Fixed Effect/Random modelling

Fixed Effects vs Clustered Standard Errors – When to Use Each in Econometrics

Related Question