Motivation for the definition of the Pearson correlation coefficient

definitionintuitionmotivationprobabilitystatistics

Let $X$ and $Y$ be two random variables with joint distribution $P_{X,Y}$ and marginal distributions $P_X$ and $P_Y$. The Pearson correlation coefficient is defined to be $$\rho_{X,Y}=\dfrac{\mathbb{E}(XY)-\mathbb{E}(X)\mathbb{E}(Y)}{\sigma_X\sigma_Y}\tag{1}$$
where $\mathbb{E}$ means the mean value and $\sigma_X,\sigma_Y$ are the respective standard deviations.

This is meant to be a quantifier of correlation. As put in Wikipedia's page:

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data.

My question is: given this intuitive idea about correlation, what is the motivation to define (1) as a quantifier of correlation? How do we motivate definition (1)?

It is also hinted upon on the linked page that $\rho_{X,Y}$ is "Mathematically, it is defined as the quality of least squares fitting to the original data". But I still fail to see wy this would be a good quantifier of correlations.

Best Answer

It helps to instead write an equivalent definition,$$\rho_{X,\,Y}=\frac{\Bbb E((X-\Bbb EX)(Y-\Bbb EY))}{\sigma_X\sigma_Y}.$$This is a covariance divided by a product of standard deviations. I've explained before that covariance is an inner product (with some qualifying statements you'll find at that link). Then standard deviation is like a squared length, so the above formula is like$$\cos\theta=\frac{a\cdot b}{|a||b|}.$$In particular, perfectly correlated variables are "parallel" in a vector space of random variables, whereas uncorrelated ones are orthogonal.

Related Question