Solved – the expected value for the Pearson correlation coefficient

expected valuepearson-r

The formula for Pearson's correlation coefficient can be written as:

$\rho_{X,Y}=\frac{\operatorname{E}[(X-\mu_X)(Y-\mu_Y)]}{\sigma_X\sigma_Y}$

My understanding of the definition of $\operatorname{E}[X]$ for a discrete random variable is:

$\operatorname{E}[X] = x_1p_1 + x_2p_2 + \cdots + x_kp_k$

How does the $\operatorname{E}[…]$ part fit in with the formula above? Or to put it another way would it not be equally legible without the $\operatorname{E}[…]$?

Best Answer

The term in the nominator is the covariance of $X$ and $Y$, $cov(X,Y) = \mathbb E[(x - \mu_{x})(y - \mu_{y})]$ and it measures how much do random variables $X$ and $Y$ vary together.

However, you might get a big value for covariance just because of the large variance of one of the random variables you're considering. That is why the $cov(X,Y)$ is then divided by $\sqrt{var(X) \cdot var(Y)} = \sqrt{cov(X,X) \cdot cov(Y,Y)}$. The math behind guarantees us that thus calculated $\rho$ will always be in the interval $[-1,1]$.