Solved – Normalized correlation with a constant vector

correlationcross correlation

I am confused how to interpret the result of performing a normalized correlation with a constant vector. Since you have to divide by the standard deviation of both vectors (reference: http://en.wikipedia.org/wiki/Cross-correlation ), if one of them is constant (say a vector of all 5's, which has standard deviation of zero), then the correlation is infinity, but in fact the correlation should be zero right? This isn't just a corner case, in general if the standard deviation of one of the vectors is small, the correlation to any other vector is very high, which obviously doesn't make sense. Can anyone explain my misinterpretation?

Best Answer

Let $\boldsymbol{x}$ and $\boldsymbol{y}$ be your two vectors and let $\boldsymbol{\bar{x}} \equiv \bar{x} \boldsymbol{1}$ and $\boldsymbol{\bar{y}} \equiv \bar{y} \boldsymbol{1}$ be constant vectors for the means of the two original vectors. The components of the sample correlation are:

$$\begin{matrix} s_{x,y}^2 = \frac{1}{n-1} (\boldsymbol{x} - \boldsymbol{\bar{x}}) \cdot (\boldsymbol{y} - \boldsymbol{\bar{y}}) & & s_x = \frac{1}{n-1} ||\boldsymbol{x} - \boldsymbol{\bar{x}}|| & & s_y = \frac{1}{n-1} ||\boldsymbol{y} - \boldsymbol{\bar{y}}||. \end{matrix}$$

The sample correlation between $\boldsymbol{x}$ and $\boldsymbol{y}$ is just the cosine of the angle between the vectors $\boldsymbol{x} - \boldsymbol{\bar{x}}$ and $\boldsymbol{y} - \boldsymbol{\bar{y}}$. Letting this angle be $\theta$ we have:

$$\rho_{x,y} = \frac{(\boldsymbol{x} - \boldsymbol{\bar{x}}) \cdot (\boldsymbol{y} - \boldsymbol{\bar{y}})}{||\boldsymbol{x} - \boldsymbol{\bar{x}}|| \cdot ||\boldsymbol{y} - \boldsymbol{\bar{y}}||} = \cos \theta.$$

Since scaling of either vector scales the covariance and standard deviation equivalently, this means that correlation is unaffected by scale. It is not correct to say that a low standard deviation gives a high correlation. What matters for correlation is the angle between the vectors, not their lengths.

In the special case where $\boldsymbol{y} \propto \boldsymbol{1}$ (i.e., $\boldsymbol{y}$ is a constant vector) you have $\boldsymbol{y} - \boldsymbol{\bar{y}} = \boldsymbol{0}$ which then gives $s_{x,y}^2 = 0$ and $s_{y} = 0$. In this case the correlation is undefined. Geometrically this occurs because there is no defined angle with the zero vector.