Regression Coefficient – Why It Is Covariance/Variance

covariancedescriptive statisticsregressionregression coefficientsvariance

Title. I'd like to understand the intuition behind how regression coefficients are calculated, and why $$\frac{cov(x,y)}{var(x)}$$ gives a regression coefficient for dependent variable y and independent variable x? This feels like an elementary question but I have looked around the internet and can't anything answering this question specifically.

Best Answer

Why $$\frac{cov(x,y)}{var(x)}$$ gives a regression coefficient for dependent variable y and independent variable x?

A linear regression coefficient tells us: If predictor variable $x$ increases by 1, what is the expected increase in outcome variable $y$?

The answer to this question depends in large part on the scales on which $x$ and $y$ are measured. E.g., if $x$ is a measure of length, imagine measuring in millimeters or centimeters; the variance of measurements in millimeters will be $10^2$ times the variance of the same measurements in centimeters; the covariance will be multiplied by 10. Note, $cov(x,y)$ is determined by three things:

  1. the linear association between $x$ and $y$;
  2. the scale of $x$;
  3. the scale of $y$.

Because of 2) and 3), I would call the covariance an unstandardized measure of association. Its value is difficult to interpret, because what would be a large and what would be a small value depends on the scales of $x$ and $y$. The correlation coefficient, however, gives us a standardized measure of association: It is 'corrected' for the scales on which $x$ and $y$ are measured:

$$cor(x,y)=\frac{cov(x,y)}{\sqrt{var(x) \cdot var(y)}}$$

The correlation coefficient tells us: If $x$ increases by $\sqrt{var(x)}$, how many $\sqrt{var(y)}$s will outcome $y$ increase? Thus, with a correlation coefficient of 1, an increase of 1 SD in $x$ is associated with an increase of 1 SD in $y$.

Now, the regression coefficient quantifies the expected increase in $y$, when $x$ increases by 1. We thus need to 'correct' the covariance between $x$ and $y$ for the scale of $x$. We can do that by simply dividing:

$$\frac{cov(x,y)}{var(x)}$$

Note that if we would 'reverse' the problem, and ask the question: If $y$ increases by 1, what is the expected increase in $x$? We can compute the answer as follows:

$$\frac{cov(x,y)}{var(y)}$$

Related Question