Regression Analysis – Gauss-Markov Assumption Violations in Simple OLS

gauss-markov-theoremregression

I am creating a simple linear model with the following form:
$$ y_i/x_i = \alpha + \beta x_i + u_i $$

The response variable has different name other than $y/x$, but it is essentially normalized by X as shown above.
So if the response, by definition includes part of the predictor can I regress this response on X, without introducing problems of bias or inefficiency? If no, can I have help with the intuition?

Best Answer

Your model is equivalent to: $$y_i=\alpha x_i+ \beta x_i^2+u_i x_i .$$

Note $\varepsilon_i = u_i x_i$, such that this model rewrites: $$y_i = \alpha x_i + \beta x_i^2 + \varepsilon_i .$$

So the two differences with a usual linear regression model are:

you don't have a constant term,
the errors $\varepsilon_i$ are proportional to the covariates $x_i$.

I don't think that the absence of constant term is a big deal (maybe I'm wrong...), but the other point is.

To have Gauss-Markov theorem, you want the $\varepsilon_i$ to have:

null expectation,
constant variance,
null covariance.

In order to have that, you need assumptions on you $x_i$ (which you don't need to have Gauss-Markov theorem for a classical linear regression). Such assumption could be that the $x_i$ are independent from the $u_i$ and have constant mean and variance. Then you'll have that the $\varepsilon_i$ satisfying the three conditions.

So, as Nick Cox commented, the answer is: it depends. If you have a fixed design ($x_i$ are constant), then no (the $\varepsilon_i$ won't have constant variance). If you suspect the $u_i$ not to be independent from $x_i$, then no again. But if the $x_i$ are i.i.d. and independent from the $u_i$, then I think yes.

Related Solutions

Solved – Overspecification bias/ including too many variables to a regression model

Quick comments:

I don't know where you're pulling .5 from?
If the variation of $u_i$ is small, you basically have a multicollinearity problem: $x_1$ and $x_2$ are for practical purposes almost the same variable.
With $x_1$ and $x_2$ almost the same, what tends to happen when you regress $y$ on $x_1$ and $x_2$ is that the sum of $\beta_1$ and $\beta_2$ will get closer to your true beta of $1$, but individually, the estimates may be crazy! You might have $\beta_1 = 2.25$ and $\beta_2 = -1.24$. The sum is close to the true value of 1, but individually they're way off the true $\beta_1 = 1, \beta_2 = 0$. Furthermore, they will be highly sensitive to small changes in your data. You can simulate this to see. Eg. a small simulation I did:

MATLAB code to generate data:

n = 10000; x1 = randn(n, 1); x2 = x1 + randn(n, 1) * .01; y = x1 + randn(n, 1);

Estimation results:

run 1: b1 = 1.34  b2 = -0.35
run 2: b1 = 2.14  b2 = -1.14
run 3: b1 = .04   b2 = .94

Observe that the sum is always about 1, but that the individual estimates are massively imprecise and vary massively between runs. If noise $u_i$ is sufficiently small, you can't distinguish the explanatory effect of $x_1$ vs. $x_2$.

Furthermore, you get results that aren't statistically significant! On the other hand, if you drop the $x_2$ variable, the t-stat shoots up to like 100.

Now let's increase $n$ to 10 million.

run 1: b1 = .98   b2 = .018
run 2: b1 = .97   b2 = .023
run 3: b1 = 1.02  b2 = -.018

Eventually you can get n large enough to distinguish $x_1$ from $x_2$ in this setup, but $n$ needs to be obscenely large.

OLS Regression – Proving That Intercept Estimator is BLUE

This is one of those theorems that is easier to prove in greater generality using vector algebra than it is to prove with scalar algebra. To do this, consider the multiple linear regression model $\mathbf{Y} = \mathbf{x} \boldsymbol{\beta} + \boldsymbol{\varepsilon}$ and consider the general linear estimator:

$$\hat{\boldsymbol{\beta}}_\mathbf{A} = \hat{\boldsymbol{\beta}}_\text{OLS} + \mathbf{A} \mathbf{Y} = [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}] \mathbf{Y}.$$

Since the OLS estimator is unbiased and $\mathbb{E}(\mathbf{Y}) = \mathbf{x} \boldsymbol{\beta}$ this general linear estimator has bias:

$$\begin{align} \text{Bias}(\hat{\boldsymbol{\beta}}_\mathbf{A}, \boldsymbol{\beta}) &\equiv \mathbb{E}(\hat{\boldsymbol{\beta}}_\mathbf{A}) - \boldsymbol{\beta} \\[6pt] &= \mathbb{E}(\hat{\boldsymbol{\beta}}_\text{OLS} + \mathbf{A} \mathbf{Y}) - \boldsymbol{\beta} \\[6pt] &= \boldsymbol{\beta} + \mathbf{A} \mathbf{x} \boldsymbol{\beta} - \boldsymbol{\beta} \\[6pt] &= \mathbf{A} \mathbf{x} \boldsymbol{\beta}, \\[6pt] \end{align}$$

and so the requirement of unbiasedness imposes the restriction that $\mathbf{A} \mathbf{x} = \mathbf{0}$. The variance of the general linear estimator is:

$$\begin{align} \mathbb{V}(\hat{\boldsymbol{\beta}}_\mathbf{A}) &= \mathbb{V}(\mathbf{A} \mathbf{Y}) \\[6pt] &= [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}] \mathbb{V}(\mathbf{Y}) [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}]^\text{T} \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}] [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}]^\text{T} \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}] [\mathbf{x} (\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A}^\text{T}] \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \mathbf{x} (\mathbf{x}^\text{T} \mathbf{x})^{-1} + (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \mathbf{A}^\text{T} + \mathbf{A} \mathbf{x} (\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A} \mathbf{A}^\text{T}] \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} + (\mathbf{x}^\text{T} \mathbf{x})^{-1} (\mathbf{A} \mathbf{x})^\text{T} + (\mathbf{A} \mathbf{x}) (\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A} \mathbf{A}^\text{T}] \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} + (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{0}^\text{T} + \mathbf{0} (\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A} \mathbf{A}^\text{T}] \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A} \mathbf{A}^\text{T}]. \\[6pt] \end{align}$$

Hence, we have:

$$\mathbb{V}(\hat{\boldsymbol{\beta}}_\mathbf{A}) - \mathbb{V}(\hat{\boldsymbol{\beta}}_\text{OLS}) = \sigma^2 \mathbf{A} \mathbf{A}^\text{T}.$$

Now, since $\mathbf{A} \mathbf{A}^\text{T}$ is a positive definite matrix, we can see that the variance of the general linear estimator is minimised when $\mathbf{A} = \mathbf{0}$, which yields the OLS estimator.

Best Answer

Related Solutions

Solved – Overspecification bias/ including too many variables to a regression model

OLS Regression – Proving That Intercept Estimator is BLUE

Related Question