Solved – Maximising the correlation between vectors vs minimising the angle between them

correlationlinear algebra

In this talk (4:58) David says that maximizing the correlation between vectors can be viewed as minimizing the angle between them, and gives two references: Breiman & Friedman 1985, and Hastie & Tibshirani 1990. The second of these is just their textbook, and the first I can't find, although they had a paper around that time about Generalised Additive Models. Basically I can't find where they discuss this. Is the claim true? Does anyone have a definitive reference?

Best Answer

It is often useful to geometrically represent random variables $X_{1}, \ldots, X_{p}$ (theoretical or empirical data) as vectors $\bf{x}_{1}, \ldots, \bf{x}_{p}$ such that their standard deviations $\sigma(X_{i})$ equal their lengths $||\bf{x}_{i}||$, and their correlations $\rho(X_{i}, X_{j})$ equal the cosine of their angles $\angle(\bf{x}_{i}, \bf{x}_{j})$. One can then use graphical illustrations and geometric intuitions to gain statistical insight.

To this end, let $\bf{\Sigma}$ be the $(p \times p)$-covariance matrix of $X_{1}, \ldots, X_{p}$ with rank $k$. Since $\bf{\Sigma}$ is positive semidefinite, we can find a decomposition $\bf{\Sigma} = \bf{B} \bf{B}'$ by defining the $(p \times k)$-matrix $\bf{B} := \bf{G} \bf{D}^{1/2}$, where $\bf{G}$ is the $(p \times k)$-matrix of eigenvectors of $\bf{\Sigma}$ and $\bf{D}$ is the $(k \times k)$-diagonal matrix of corresponding positive eigenvalues.

$\bf{B} \bf{B}'$ is the matrix of dot products of the rows of $\bf{B}$, i.e., $\bf{\Sigma}_{ij} = \langle\bf{B}_{i}, \bf{B}_{j}\rangle = \bf{B}_{i}'\bf{B}_{j}$. Now we get the desired representation in $k$-dimensional space by defining $\bf{x}_{i} := \bf{B}_{i}$, because then we have $$ ||\bf{x}_{i}|| = \sqrt{\langle\bf{x}_{i}, \bf{x}_{i}\rangle} = \sqrt{\bf{\Sigma}_{ii}} = \sigma(X_{i}) $$

And we also have (assuming $||\bf{x}_{i}|| > 0$ and $||\bf{x}_{j}|| > 0$) $$ \begin{array}{rcl} \cos(\angle(\bf{x}_{i}, \bf{x}_{j})) &=& \frac{\langle\bf{x}_{i}, \bf{x}_{j}\rangle}{||\bf{x}_{i}|| \cdot ||\bf{x}_{j}||} = \frac{\langle\bf{x}_{i}, \bf{x}_{j}\rangle}{\sqrt{\langle\bf{x}_{i}, \bf{x}_{i}\rangle} \cdot \sqrt{\langle\bf{x}_{j}, \bf{x}_{j}\rangle}}\\ &=& \frac{\bf{\Sigma}_{r} \bf{x}_{ir} \bf{x}_{jr}}{\sqrt{\bf{\Sigma}_{r} \bf{x}_{ir}^{2}} \, \sqrt{\bf{\Sigma}_{r} \bf{x}_{jr}^{2}}} = \frac{Cov(X_{i}, X_{j})}{\sigma(X_{i}) \, \sigma(X_{j})}\\ &=& \rho(X_{i}, X_{j}) \end{array} $$

Since $\cos(0) = 1$, maximizing the correlation between variables can be viewed as minimizing the angle between their corresponding vectors.

If we have empirical data $n$-vectors $\bf{x}_{i}$ with mean vectors $\bar{\bf{x}}_{i}$, then the representation immediately follows for the corresponding centered variables $\dot{\bf{x}}_{i}$ since $\langle\dot{\bf{x}}_{i}, \dot{\bf{x}}_{j}\rangle = \sum\limits_{r=1}^{n}(\bf{x}_{ir} - \bar{\bf{x}}_{i})(\bf{x}_{jr} - \bar{\bf{x}}_{j}) = n \, Cov(X_{i}, X_{j})$.

So in this case, $\dot{\bf{x}}_{i} / \sqrt{n}$ already is the desired representation - although in $n$-dimensional space, whereas we only need $k \leqslant n$ dimensions in general.

For applications, see e.g. this answer or this answer.

Related Question