Regression – Correlation Between OLS Estimators for Intercept and Slope

estimatorsleast squaresregression

In a simple regression model,

$$ y = \beta_0 + \beta_1 x + \varepsilon, $$

the OLS estimators $\hat{\beta}_0^{OLS}$ and $\hat{\beta}_1^{OLS}$ are correlated.

The formula for the correlation between the two estimators is (if I have derived it correctly):

$$ \operatorname{Corr}(\hat{\beta}_0^{OLS},\hat{\beta}_1^{OLS}) = \frac{-\sum_{i=1}^{n}x_i}{\sqrt{n} \sqrt{\sum_{i=1}^{n}x_i^2} }. $$

Questions:

  1. What is the intuitive explanation for the presence of correlation?
  2. Does the presence of correlation have any important implications?

The post was edited and the assertion that the correlation vanishes with sample size has been removed. (Thanks to @whuber and @ChristophHanck.)

Best Answer

Let me try it as follows (really not sure if that is useful intuition):

Based on my above comment, the correlation will roughly be $$-\frac{E(X)}{\sqrt{E(X^2)}}$$ Thus, if $E(X)>0$ instead of $E(X)=0$, most data will be clustered to the right of zero. Thus, if the slope coefficient gets larger, the correlation formula asserts that the intercept needs to become smaller - which makes some sense.

I'm thinking of something like this:

In the blue sample, the slope estimate is flatter, which means the intercept estimate can be larger. The slope for the golden sample is somewhat larger, so the intercept can be somewhat smaller to compensate for this.

enter image description here

On the other hand, if $E(X)=0$, we can have any slope without any constraints on the intercept.

The denominator of the formula can also be interpreted along these lines: if, for a given mean, the variability as measured by $E(X^2)$ increases, the data gets smeared out over the $x$-axis, so that it effectively "looks" more mean-zero again, loosening the constraints on the intercept for a given mean of $X$.

Here's the code, which I hope explains the figure fully:

n <- 30
x_1 <- sort(runif(n,2,3))
beta <- 2
y_1 <- x_1*beta + rnorm(n) # the golden sample

x_2 <- sort(runif(n,2,3)) 
beta <- 2
y_2 <- x_2*beta + rnorm(n) # the blue sample

xax <- seq(-1,3,by=.001)
plot(x_1,y_1,xlim=c(-1,3),ylim=c(-4,7),pch=19,col="gold",ylab="y",xlab="x")
abline(lm(y_1~x_1),col="gold",lwd=2)
abline(v=0,lty=2)
lines(xax,beta*xax) # the "true" regression line
abline(lm(y_2~x_2),col="lightblue",lwd=2)
points(x_2,y_2,pch=19,col="lightblue")
Related Question