[Math] Correlation Coefficient as Cosine

probabilityprobability theory

I've read that the correlation coefficient between two random variables may be viewed as the cosine as the angle between them, but I can't find any solid explanation.

To be concrete, let $X$ and $Y$ be random variables on $(\Omega, \mathcal{F}, P)$ with correlation coefficient $\rho$. Assume $X,Y \in L^2(\Omega,\mathcal{F},P)$. The quantity $\rho$ is defined as
$$
\rho := \frac{Cov(X,Y)}{\sqrt{Var(X)Var(y)}}.
$$
Letting $\mu_X := E(X)$ and $\mu_Y := E(Y)$, note
$$
Cov(X,Y) = E((X – \mu_X)(Y – \mu_Y)) = \left< X – \mu_X, Y – \mu_Y\right>_{L^2}
$$
and
$$
Var(X) = E((X – \mu_X)^2) = ||X – \mu_X||_{L^2}^2,
$$
where $L^2 := L^2(\Omega, \mathcal{F}, P)$. Thus
$$
\rho = \frac{\left< X – \mu_X, Y – \mu_Y\right>_{L^2}}{||X – \mu_X||_{L^2} ||Y – \mu_Y||_{L^2}}. \qquad (1)
$$
Compare this with the Euclidean space inner product result that for two vectors $x,y \in \mathbb{R}^n$,
$$
\cos(\theta) = \frac{x^Ty}{||x||\, ||y||},
$$
where $\theta$ is the angle between $x$ and $y$.

The recurring claim I read is that I can think of $\rho$ as the cosine of the "angle" between the two random variables, but it seems it only makes sense in terms of $L^2$ inner products. But in that case, the only notion of "angle" between to elements $X,Y \in L^2$ I can think of is
$$
\cos(\theta) = \frac{\left< X,Y \right>_{L^2}}{||X||_{L^2} ||Y||_{L^2}}. \qquad (2)
$$

So, two questions:

  1. Is Eqn. (2) a valid notion of angle in $L^2$?
  2. Are Eqns (1) and (2) somehow equlivalent?

If both of these are true, I can justify viewing $\rho = \cos(\theta)$.

Best Answer

$\newcommand{\lowersub}[1]{_{\lower{0.5ex}{\small #1}}}$ The Euclidean norm of an $n$-dimensional vector $(X_1, X_2,...,X_n)^\top$ is ${\lVert x\rVert}_2 =\sum\limits_{k=1}^n {X}^2_k$, which is a measure of the distance from the origin.

In a similar way, $\sqrt{\mathsf E\Big(\!\big(X-\mathsf E(X)\big)^2\Big)}$ is a measure of the magnitude a random variable deviates from its mean.   Thus the standard deviation may be regarded as the weighted norm of the centred random variable.

( As A.S. commented, centering gives a more meaningful comparison of how two random variables deviate from their means. )

For discrete random variable we have: $$\lVert X-\mu\lowersub X\rVert = \sqrt{\sum_{\forall x} (x-\mu\lowersub{X})^2\mathsf P(X=x)}$$

And for a continuous real-valued random variable we have: $$\lVert X-\mu\lowersub{X}\rVert = \sqrt{\int_\Bbb R (x-\mu\lowersub{X})^2\,f\lowersub{X}(x)\operatorname d x}$$

Likewise, the angle between two $n$-dimensional vectors, $\vec x, \vec y$ is related to their inner product, and norms, by the cosine rule.

$$\cos \alpha\lowersub{\vec x,\vec y} = \frac{\langle \vec x, \vec y\rangle}{{\lVert \vec x\rVert}_2{\lVert \vec y\rVert}_2}$$

Similarly the co-variance, of two centered random variables, is analogous to an inner product, and so we have the concept of correlation as the cosine of an angle.

$$\rho\lowersub{X,Y}=\dfrac{\mathsf E((X-\mu\lowersub{X})(Y-\mu\lowersub{Y}))}{\sqrt{\mathsf E((X-\mu\lowersub{X})^2)\,\mathsf E((Y-\mu\lowersub{Y})^2)}}=\dfrac{\langle X-\mu\lowersub{X} , Y-\mu\lowersub{Y}\rangle}{\lVert X-\mu\lowersub{X}\rVert\,\lVert Y-\mu\lowersub{Y}\rVert} = \cos \theta\lowersub{X,Y}$$