How reasonable is the linearity assumption in regression analysis

economicsstatistical-inference

As someone who moved from physics to economics, there are some things about regression analysis that have always bothered me. One of the most basic assumptions in econometrics is that for studying the effect of an independent variable $x$ on a dependent variable $y$, we can assume the following linear relationship between them:
$$y = \alpha + \beta x + e$$
Where $e$ is an error term. Coefficients $\alpha$ and $\beta$ are then estimated by minimising the least square errors.

From a purely mathematical perspective, how reasonable is this assumption? If you have any function $y=f(x)$, if the domain of $f$ is restricted to a small number of real values (as it normally is in economics), when is it fine to assume linearity?

Also, is the nature of the variables in economics one of the reasons for this? I mean is it more appropriate to assume linearity when say $y$ is wage and $x$ is years of education than in the model of a physical system?

Best Answer

First off, econometrics does not assume that relationships of $x$ and $y$ in economics are linear [in general]. Rather, it says that if a relationship is linear, then you can use OLS to estimate these effects. Moreover, OLS is the most efficient estimator in this case, as you undoubtedly have learned in econometrics.

Second, fair enough, if you look at the applied economics literature, then researchers often do not bother to discuss whether the linearity assumption is plausible in their case. And very often it will not be. Still, sometimes the linearity assumption is innocuous. For example, if all your $x$ are dummy variables denoting group-membership, then you just estimate conditional group-means. Or if the relationship is quadratic, then you can still include higher order terms in your OLS regression and everything is fine, or in fact any other higher order polynomial.

More generally, I like to view OLS not as a linear model but one that assumes additive separability. Because $y=\alpha+\beta \log(x_1)+\gamma x_2+\epsilon$ can still be estimated with OLS even though it is clearly not linear in $x_1$. And OLS is very often used to estimate such nonlinear relationships (e.g., logs when estimating elasticities).

Third, if you are looking at a relationship where you believe that additive separability is not plausible, then there are other tools and you should use them. For example nonlinear least squares. Or, if the dependent variable is binary (zero-one), then applied economists tend to use the nonlinear logit regression more than OLS.

Fourth, there is a different approach in economics called "structural estimation". This is probably closer to what you are used to from physics. The idea here is that you write down an economic theory model and then estimate the parameters of this model empirically. This is very popular in the field of industrial organization. The relationship of two economic variables in such structural models will be linear only if they are also linear in the economic model based on the assumed utility functions, error distributions, etc.

Overall, I agree that economists tend to use OLS a lot, and sometimes in cases where they shouldn't. A main reason is probably indoctrination: OLS is covered in grad school while Poisson regression or other nonlinear models aren't. Another reason is that OLS is amazingly simple to interpret. Sometimes economists are aware that a relationship is not linear, but they estimate a linear model anyway because the resulting approximation is easier to interpret ("If you increase $x_1$ by 1, then your $y$ decreases by 0.3 on average!").

Related Question