[Math] Pearson correlation and metric properties

correlationmetric-spaces

Assuming that the data set was $z$-standardized to zero mean and unit variance (also assuming that it does not contain constant vectors).

Then Pearson's r reduces to Covariance:
$$\rho(X,Y) := \frac{Cov(X,Y)}{\sigma(X)\sigma(Y)} = Cov(X,Y)$$

Now I'm investigating the dissimilarity function
$$d(X,Y):=\sqrt{1 – \rho(X,Y)}$$
which is the square root of a common transformation of $\rho$ for use as a dissimilarity function.

It can be shown that given above preconditions, it is in fact a linear multiple of Euclidean distance, and thus trivially metric:

$$\sqrt{\sum_i (x_i – y_i)^2} = \sqrt{\sum_i x_i^2+\sum_i y_i^2 – 2 \sum_i x_i\cdot y_i} \\
= \sqrt{n + n – 2n \cdot Cov(X,Y)} = \sqrt{2n} \cdot d(X,Y)$$
i.e.
$$d(X,Y) = euclidean(X,Y) / \sqrt{2n}$$
Now I'm wondering if the properties of this function $d$ have been further explored. Is it metrical under a broader set of conditions than z-standardized data sets? Do you know related literature or proofs?

Best Answer

Yes there are a lot of related papers that I came across couple of weeks ago, which talk about converting Pearson Correlation to Euclidean distance, when data is z-normalized.

Your question was back in 2013, I hope you are still interested:

  1. StatStream Statistical Monitoring of Thousands of Data streams in real time (First paper to proof the relation and find the formula).

  2. Exact Discovery of Time Series Motifs

  3. Logical Shapelets An Expressive Primitive for Time
  4. On Similarity-Based Queries for Time Series Data
  5. Time Series Join on Subsequence Correlation

And a lot more.

Related Question