Solved – Omitted Variable Bias in fixed effect regressions

biasfixed-effects-model

Every researcher in economics always seems to argue that, as soon as you use a time and person fixed effects regression on panel data, you can be sure that there is no omitted variable bias. But is this really true?

Can't there still be a latent variable influencing all units of observations/ the independent variables of interest in a similar, non-constant way without being time and person dependent?

I am hesitant to believe/understand that year dummies control for all latent variables which slightly change over time.
I also found reference to this in this answer but I don't get it.

Best Answer

I find your question highly interesting, since I myself have had the same doubts. Here are some of my thoughts...

In a panel study fixed effects control for every variable that is constant over time, i.e sex, and the stable part (the mean) of every variable that changing. That leaves the part of the variable that changes as an independent variable if the variable is in the model or in the error term (the idiosyncratic error) if the variable is unobserved. This is thought to reduce bias in the model.

Graph theory is a modern school of causality. Graph theory have established rules when a variable should be controlled for. (See Elvert & Winship, 2014, Endogenous Selection Bias) Lets say we are interested in if variable X causes variable Y. If a variable Z causes both X and Y, then Z will cause the relationship X -> Y to be biased. This is solved by conditioning on Z in our regression. Now lets say that we have the same variables, however Z is now not the cause for both X and Y but instead caused by them. The correlations is exactly the same as above, it is just the causal arrow that is reversed. In this case Z should be left out of the equation. If Z is included this will invite bias into the model. Finally, what about if Z is caused by X and in turn causes Y? In this case a part of the total effect that X have on Y will be indirect through Z. If we control for Z by including it in our model we only estimate the direct effect of X on Y. However, if we wanted to know the total effect of X on Y then we have invited a bias by including Z in the equation. This is called overcontrol bias.

Fixed effects is by definition the econometric equivalence to an nuclear blast. You clean your model of the mean value of every conceivable variable. This is source of the strength of the method... and its horrible weakness. The model is normally used as a safetly precaution when the researcher does not know if there are any unmeasured confounders. However, by doing so, more bias might be invited into the model than are removed. Furthermore, the stable part of any indirect effect between an independent and dependent variable will be lost, which at least should be recognised by the researcher.

I recommend that you only use fixed effects when you are confident that there are only unobserved variables that cause both the independent and dependent variables, and none variables that are caused by both the independent and dependent variables. In order to make this assessment you need to know which variables are part of the causal system, which is why you need to write your own causal graph before you choose your model. In some cases, you might be better of by using some other method, in other cases, by using regular standard OLS regression and ignoring the potential threat of an unmeasured variable nobody can name anyway.

Related Question