Solved – Endogeneity & IV = model misspecification

causalityconsistencyendogeneityinstrumental-variablesmisspecification

I'd like to raise a controversial point: if you need instrumental variables, your model is wrong.

Basic endogeneity problem and the IV solution

Let us suppose the basic framework of endogeneity and instrumental variables (IV): we want to estimate the regression function $E(y|x,z)$, and choose a linear model of $y$ on $x$ and $z$, where $x$ is endogenous and $z$ is not. Endogeneity is a serious problem, and can stem from many causes: reverse causation, simultaneity, omitted variables, measurement errors correlated to $y$, model misspecification, etc.

$y_i= \beta_0 + \beta_1 x_i + \beta_2 z_i + \varepsilon_i$ (1)

$E(\varepsilon|x) \neq 0$, $E(\varepsilon|z) = 0$

The OLS estimate of the $x$ parameter $\hat{\beta_1}_{OLS}$ is biased and inconsistent because of the endogeneity of $x$. We thus turn to an instrument, say $w$, which should be (ideally well) correlated to $x$ (conditional on the other regressors) and uncorrelated to the error term $\varepsilon$ of the previous model, in practice:

$E(x|w) \neq E(x)$, $E(x|w) \neq E(x|z,w)$, $E(\varepsilon|w) = 0$

The IV method can be seen as replacing $x$ by $\hat x_{z,w}$ in model (1), where $\hat x_{z,w}$ is the prediction of a linear regression of $x$ on $z$ and $w$. OLS on this new model yields a consistent estimate $\hat{\beta_1}_{IV}$ of the parameter of interest $\beta_1$.

Interpretation of IV estimates

Notice that the IV fit necessarily has a lesser predictive power compared to the OLS fit, because by definition OLS has the lowest mean squared error among all linear models. In econometrics this is justified as a trade-off between goodness-of-fit and consistency of the parameter estimation: the main interest lies not in prediction but in the estimated values of certain parameters, usually in linear models derived from economic theory.

My problem is that most econometric models interpret these parameters of interest ($\beta_1$ in model 1), as the partial derivative $\frac{\partial y}{\partial x}$, the mean effect of a variation of $x$ on the variation of $y$. This is obviously wrong, because in order to compute $\hat{\beta_1}_{IV}$ we deliberately ignored the non-linear effect that $x$ has on $y$ through $\varepsilon$, restricting ourselves to the $\beta_1$ component. This is an important point to make, given the high regard in which IV estimates are held by academics and policy makers.

Of course one could argue that the goal of modelization is to approximate reality, and that to capture the isolated linear effect is better than nothing. The fact of the matter, I would argue, is that if modelization and the estimation of $\frac{\partial y}{\partial x}$ are to be taken seriously, they should not be estimated by sweeping under the rug those relationships that are present in the data but do not conform with the model specification.

If you need IV, your model is wrong

If the point of interest is the estimation of the effect of $x$ on $y$ (given the other regressors) and if $x$ is endogenous in the linear model, then the specified linear model is just wrong, because it violates the hypothesis of non-correlation of $\varepsilon$. Using IV seems to amount to just ignoring this problem and return the "pure" linear effect of an erroneous model, which sounds rather insulting to the scientific spirit. At best, it can be seen as a truncated Taylor expansion, which will only be valid close to the mean point of the dataset.

So rather than resorting to IV when confronted with endogeneity, it seems to me one should aim to find a better model to explain the relationship. A great many "universal approximation" models are readily available (kernel regression, lasso, gradient boosting, neural networks…) and can be used as a benchmark in comparison with the prediction power of any specified model. If a model has poor prediction power, it is a poor representation of reality.

If the final aim is to estimate the "full" partial effect of $x$ on $y$, algorithms such as local kernel-weighted regressions can be run at the points of interest, the parameters of which can be interpreted as local partial derivatives.

And if one is stuck with an endogeneity-ridden linear model because of economic theory, then IV estimates will only be gross approximations of the true effects, and conclusions will only be misleading, right?

Any comments, or references to related literature? (sorry for the long post)

Best Answer

You raise too many issues, and for most of them any answer will probably be seen as being primarily opinion-based, so this post may be quickly closed.

To provide some comments on some of them:

Quote: If a model has poor prediction power, it is a poor representation of reality.

This would be true in the natural sciences, where physical laws appear to not change over the years. But in social sciences, "laws" change over the years or over different socio-economical environments. Example: the econometric models that failed to cope with the Oil crisis of '73 became useless for the crisis and for after the crisis. They were doing a pretty good job up to that point, and even today, they remain pretty adequate models to describe the economy before that crisis. "Reality" is not the same as "future".

Quote: (The IV estimator)...At best, it can be seen as a truncated Taylor expansion, which will only be valid close to the mean point of the dataset.

and

Quote: ... IV estimates will only be gross approximations of the true effects, and conclusions will only be misleading, right?

...which implies that you somehow are certain that the correlation of the regressor with the error term is in most cases "large" and "strong". Is there evidence of such?

Mind you, I am not in love with IV estimation -I am just trying to point to you aspects of your deliberations that could be made more robust.