Solved – Does vector version of the Cauchy-Schwarz inequality ensure that the correlation coefficient is bounded by 1

correlationmathematical-statisticsprobability-inequalities

I have been trying to understand the proof that the correlation between two random variables $X$ and $Y$ is between $-1$ and $1$. For simplicity, suppose $X$ and $Y$ have mean zero. Then

$$\mathrm{corr}(X,Y)=\frac{\mathbb{E}[XY]}{\sqrt{\mathbb{E}[X^2]\mathbb{E}[Y^2]}}.$$

I know that there's a proof that doesn't use the Cauchy-Schwarz inequality but I'd like to understand the one that does.

To show that correlation cannot exceed $1$, I want to show that:

$$\mathbb E[XY] \le \sqrt{\mathbb E[X^2] \mathbb E[Y^2]}.$$

Now this does look exactly like one half of the CS inequality ($\rho \ge -1$ is the other half) but we have the $\mathbb E[\cdot]$ operator wrapped around things and instead of inner products of vectors, $\langle x,y \rangle $, we have products of random variables. The proofs I've seen stop here and say "true by the CS inequality". Apparently there is a probabilistic version of it.

My question is, starting from one side of the CS inequality for vectors I know from linear algebra $$\langle x,y \rangle \le \sqrt{\langle x,x \rangle}\sqrt{\langle y,y \rangle},$$ are there steps that actually take me to the above equation?

Edit: The answer and comments helped, although I simply don't know enough math to really follow. If someone else is in the same boat, I found a link that explicitly goes into why and how random variables can be thought of as vectors in a vector space: http://www.math.uah.edu/stat/expect/Spaces.html

Best Answer

The Cauchy-Schwarz inequality is the same. Only the inner product is a different one. You might have seen $E(X)$ written as an integral? $$<X,Y>:=\int_{\Omega} XY\mathrm{d}P$$ for real random variables on a probability space $\Omega$ with probability measure is the solution. Now you have to check the axioms, in particular definiteness. To get the idea there, remember that "$=$" is to strict in probability, since we don't have to care about what happens on sets with measure $0$, i.e. for events that never occur.

Related Question