Assuming that the data set was $z$-standardized to zero mean and unit variance (also assuming that it does not contain constant vectors).
Then Pearson's r reduces to Covariance:
$$\rho(X,Y) := \frac{Cov(X,Y)}{\sigma(X)\sigma(Y)} = Cov(X,Y)$$
Now I'm investigating the dissimilarity function
$$d(X,Y):=\sqrt{1 – \rho(X,Y)}$$
which is the square root of a common transformation of $\rho$ for use as a dissimilarity function.
It can be shown that given above preconditions, it is in fact a linear multiple of Euclidean distance, and thus trivially metric:
$$\sqrt{\sum_i (x_i – y_i)^2} = \sqrt{\sum_i x_i^2+\sum_i y_i^2 – 2 \sum_i x_i\cdot y_i} \\
= \sqrt{n + n – 2n \cdot Cov(X,Y)} = \sqrt{2n} \cdot d(X,Y)$$
i.e.
$$d(X,Y) = euclidean(X,Y) / \sqrt{2n}$$
Now I'm wondering if the properties of this function $d$ have been further explored. Is it metrical under a broader set of conditions than z-standardized data sets? Do you know related literature or proofs?
Best Answer
Yes there are a lot of related papers that I came across couple of weeks ago, which talk about converting Pearson Correlation to Euclidean distance, when data is z-normalized.
Your question was back in 2013, I hope you are still interested:
StatStream Statistical Monitoring of Thousands of Data streams in real time (First paper to proof the relation and find the formula).
Exact Discovery of Time Series Motifs
And a lot more.