How does the formula of the correlation coefficient measures “linear” relationship

probabilitystatistics

We do know that Pearson's correlation correlation coefficient measures the strength of the relationship (how much correlated) between two random variables , but then, what about $\textbf{linearity}$ , how does this very formula :

$$r = \frac{\sum_{i=1}^{n}(x_i – \bar{x})(y_i – \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i – \bar{x})^2\sum_{i=1}^{n}(y_i – \bar{y})^2}}$$

measures specifically a $\textbf{linear}$ relationship ? Is there an intuitive way to look at it that would explain why does it quantify a linear relationship ?

Best Answer

In order to show how the Pearson's correlation correlation coefficient (simply "r" from now) measures the strength of the linear relationship between two variables, it may be useful to show that if one variable is a (positive) linear combination of the other, then $r$ = $1$.

That is:

$$\forall a, b \in \mathbb{R}, Y = aX + b \Rightarrow Cov(X, Y) = \sqrt{Var(X)}\sqrt{Var(Y)}$$

where the latter clearly implies $r$ = $1$.

Proof:

\begin{align} Cov(X, Y) &= E(XY) - E(X)E(Y) \\ &= E[X(aX + b)] - E(X)E(aX + b) \\ &= E(aX^{2} + bX) - a[E(X)]^{2} - bE(X) \\ &= a[E(X^{2}) - [E(X)]^{2}] + bE(X) - bE(X) \\ &= aVar(X) \end{align}

where I have used $E(aX)$ = $a$$E(X)$ and $E(b)$ = $b$ if $b$ and $a$ are constants.

We also have:

$$Var(Y) = Var(aX + b) = a^{2}Var(X)$$

using $Var(aX)$ = $a^{2}$$Var(X)$ and $Var(b)$ = $0$ if $b$ and $a$ are constants. This implies:

$$\sqrt{Var(Y)} = a\sqrt{Var(X)}$$

from which we finally obtain that:

$$\sqrt{Var(X)}\sqrt{Var(Y)} = aVar(X)$$

proving the claim. Similarly can be proved $r$ = $-1$ if $Y$ = $-aX$ + $b$ exploiting $Var(-X)$ = $Var(X)$.

More in general, when $r$ is between $-1$ and $1$ (excluding the case $0$ implying no linear relation) it means that the data present "somewhat" a linear relationship. That is, scatter plotting the two variables, we would see that the majority of the data points (excluding outliers) are gathered in a cloud around line, and the more $r$ is far from either $-1$ or $1$, the more disperse is the cloud around, respectively, a negatively and positively sloped line.

Related Question