Solved – Are the relations in fixed, random and mixed effect models and multilevel models causal

causalityfixed-effects-modelmixed modelrandom-effects-model

In fixed, random and mixed effect models, and multilevel models, the response random variable is represented as a function of some explanatory variables and random errors. I was wondering if the relations implied by them are considered causal, and therefore used in causal inference? Thanks!

Best Answer

Whether a coefficient from a model has a causal interpretation mostly depends on the other variables included or the way that unobserved but relevant variables are controlled for. For example, in an earnings regression of the type $$\ln(y_{i}) = \alpha + \delta S_{i} + \gamma A_{i} + X'\beta + \epsilon$$ where the dependent variable is log earnings, $S_{i}$ is years of education, $A_{i}$ is ability and $X$ are other relevant variables that affect wages like parental background, age, gender, etc.

Assume $A_{i}$ and $S_{i}$ are correlated and that there are no other endogeneity issues or measurement error. If you can observe $S_{i}$, $A_{i}$ and $X$, then the coefficient $\delta$ has a causal interpretation, i.e. it is the causal effect of an additional year of education on earnings - holding all else constant. This ceteris paribus assumption is what makes causality.

To extend this example to your fixed effects model, if you have panel data and you don't observe $A_{i}$, you can still consistently estimate $\delta$ using fixed effects. Suppose $S_{i}$ varies over time and $A_{i}$ does not vary over time, then $$\ln(y_{i}) = \eta + \delta S_{i} + X'\beta + \epsilon$$ the absorbing variable $\eta = \alpha + A_{i} + G_{i}$ includes all observed and unobserved variables that do not vary over time, like the intercept or $G_{i} =$ gender, place of birth, etc. So it pulls $A_{i}$ out of the error and hence removes the endogeneity problem (remember $A_{i}$ and $S_{i}$ are correlated, so if $A_{i}$ is in the error, $S_{i}$ will be correlated with the error). The problem is that $A_{i}$ is likely not to be fixed over time as for instance mental capabilities and productivity diminish with old age.

In theory, I could go on providing examples for each type of your models but I guess you get the idea. Whether or not you estimate a causal effect depends on the included (and omitted!) variables AND on the assumptions of the model. So see what kind of data you have at hand, what you can control for in terms of relevant variables for the relationship you are after (perhaps you don't even have an endogeneity problem), and what assumptions are the most realistic for your analysis to be credible. If you want to dig a little deeper into the topic of causal effects estimation, Mostly Harmless Econometrics by Angrist and Pischke is an excellent book. Otherwise you will find plenty of lecture notes online.

Related Question