Regression – Correlation Between OLS Estimators for Intercept and Slope

estimatorsleast squaresregression

In a simple regression model,

$$ y = \beta_0 + \beta_1 x + \varepsilon, $$

the OLS estimators $\hat{\beta}_0^{OLS}$ and $\hat{\beta}_1^{OLS}$ are correlated.

The formula for the correlation between the two estimators is (if I have derived it correctly):

$$ \operatorname{Corr}(\hat{\beta}_0^{OLS},\hat{\beta}_1^{OLS}) = \frac{-\sum_{i=1}^{n}x_i}{\sqrt{n} \sqrt{\sum_{i=1}^{n}x_i^2} }. $$

Questions:

What is the intuitive explanation for the presence of correlation?
Does the presence of correlation have any important implications?

The post was edited and the assertion that the correlation vanishes with sample size has been removed. (Thanks to @whuber and @ChristophHanck.)

Best Answer

Let me try it as follows (really not sure if that is useful intuition):

Based on my above comment, the correlation will roughly be $$-\frac{E(X)}{\sqrt{E(X^2)}}$$ Thus, if $E(X)>0$ instead of $E(X)=0$, most data will be clustered to the right of zero. Thus, if the slope coefficient gets larger, the correlation formula asserts that the intercept needs to become smaller - which makes some sense.

I'm thinking of something like this:

In the blue sample, the slope estimate is flatter, which means the intercept estimate can be larger. The slope for the golden sample is somewhat larger, so the intercept can be somewhat smaller to compensate for this.

On the other hand, if $E(X)=0$, we can have any slope without any constraints on the intercept.

The denominator of the formula can also be interpreted along these lines: if, for a given mean, the variability as measured by $E(X^2)$ increases, the data gets smeared out over the $x$-axis, so that it effectively "looks" more mean-zero again, loosening the constraints on the intercept for a given mean of $X$.

Here's the code, which I hope explains the figure fully:

n <- 30
x_1 <- sort(runif(n,2,3))
beta <- 2
y_1 <- x_1*beta + rnorm(n) # the golden sample

x_2 <- sort(runif(n,2,3)) 
beta <- 2
y_2 <- x_2*beta + rnorm(n) # the blue sample

xax <- seq(-1,3,by=.001)
plot(x_1,y_1,xlim=c(-1,3),ylim=c(-4,7),pch=19,col="gold",ylab="y",xlab="x")
abline(lm(y_1~x_1),col="gold",lwd=2)
abline(v=0,lty=2)
lines(xax,beta*xax) # the "true" regression line
abline(lm(y_2~x_2),col="lightblue",lwd=2)
points(x_2,y_2,pch=19,col="lightblue")

Related Solutions

Solved – Quantile Regression vs OLS for homoscedasticity

Will the estimated slope coefficient $\beta_1$ always be the same for OLS and for QR for different quantiles?

No, of course not, because the empirical loss function being minimized differs in these different cases (OLS vs. QR for different quantiles).

I am well aware that in the presence of homoscedasticity all the slope parameters for different quantile regressions will be the same and that the QR models will differ only in the intercept.

No, not in finite samples. Here is an example taken from the help files of the quantreg package in R:

    library(quantreg)
    data(stackloss)
    rq(stack.loss ~ stack.x,tau=0.50) #median (l1) regression fit 
                                      # for the stackloss data.
    rq(stack.loss ~ stack.x,tau=0.25) #the 1st quartile

However, asymptotically they will all converge to the same true value.

But in the case of homoscedasticity, shouldn't outliers cancel each other out because positive errors are as likely as negative ones, rendering OLS and median QR slope coefficient equivalent?

No. First, perfect symmetry of errors is not guaranteed in any finite sample. Second, minimizing the sum of squares vs. absolute values will in general lead to different values even for symmetric errors.

Solved – How to prove whether or not the OLS estimator $\hat{\beta_1}$ will be biased to $\beta_1$

We need to take some care with the notation because the models differ.

Let the first (correct) model be

$$Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \varepsilon\tag{1}$$

where the $\varepsilon_i$ have a common variance and zero means; and write the second model (which governs the very same variables $Y$, so no need to change their name) as

$$Y = \alpha_0 + \alpha_1 X_1 + \delta.\tag{2}$$

As an aside, we may impose no additional assumptions on $\delta$ because these random variables are completely determined by equating the two right hand sides (which, after all, equal the same things):

$$\delta = (\beta_0 - \alpha_0) + (\beta_1 - \alpha_1)X_1 + \beta_2 X_2 + \varepsilon.$$

(From now on I will drop the generic discussion of models to focus on a dataset with explanatory values $x_{1i}$ and $x_{2i},$ responses $y_i,$ and associated error $\varepsilon_i$ and $\delta_i.$)

We can infer, however, that the $\delta_i$ all have the same variances as the $\varepsilon$ and their means are

$$E[\delta_i] = (\beta_0 - \alpha_0) + (\beta_1 - \alpha_1)x_{1i} + \beta_2 x_{2i},$$

which may vary among observations.

Let's return to the analysis. Fitting the second model gives the slope estimate

$$\hat\alpha_1 = \frac{\sum_{i} (y_i - \bar y)(x_{1i} - \bar{x}_1)}{\sum_{i} (x_{1i} - \bar{x}_1)^2}.\tag{*}$$

This is a linear combination of the $y_i-\bar y,$ so use the zero-mean assumption about the $\varepsilon_i$ to compute

$$E[y_i - \bar y] = (\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}) -(\beta_0 + \beta_1 \bar{x}_1 + \beta_2 \bar{x}_2) = \beta_1(x_{1i}-\bar{x}_i) + \beta_2(x_{2i} - \bar{x}_2)$$

and apply linearity of expectation in $(*)$ to compute

$$E[\hat\alpha_1] = \beta_1 + \beta_2\frac{\sum_{i} (x_{2i}-\bar{x}_2)(x_{1i} - \bar{x}_1)}{\sum_{i} (x_{1i} - \bar{x}_1)^2}.$$

Equating this with $\beta_1$ to assess the bias in using $\hat\alpha_1$ to estimate $\beta_1,$ we find it will be unbiased if and only if the second term is zero. This can happen in two ways:

If $\beta_2 = 0.$ (This just means the second model is correct.)
If $\sum_{i} (x_{2i}-\bar{x}_2)(x_{1i} - \bar{x}_1)=0.$ This means the covariance of the $x_1$ data and the $x_2$ data is zero: that is, the design vectors are orthogonal.

If neither of these is the case, the bias is nonzero. That agrees exactly with your intuition.

Best Answer

Related Solutions

Solved – Quantile Regression vs OLS for homoscedasticity

Solved – How to prove whether or not the OLS estimator $\hat{\beta_1}$ will be biased to $\beta_1$

Related Question