I asked a question about forming a valid distance metric yesterday (Link1) and got some very good answer; however, I have got some more questions about forming a proper distance metric for high dimensional data.
-
Why is triangle inequality so important to make a valid distance metric? Maybe it is too broad to ask this, but I haven't got a simple example in my mind. Not sure if you people can think a simple scenario to explain this with some context?
-
As mentioned in my previous post (Link1), I think Cosine similarity is the same thing as dot product. Am I right? If so, dot product is not a valid distance metric because it does not have the triangle inequality property and etc. If we transform the similarity measured by dot product into Angular similarity, will it be a proper distance metric?
-
Regarding to the Euclidean distance, there is another post (Link2) saying that it is not a good metric in high dimensions. As my data vectors are in high dimensional space, I am wondering if some distance metric suffer from the curse of dimensionality?
-
Regarding to the point C above, considering the dimensionality, will a fractional distance metric be a better distance metric? (Link3)
Thanks very much! A
Best Answer
For high-dimensional data, shared-nearest-neighbor distances have been reported to work in
Fractional distances are known to be not metric. $L_p$ is only a metric for $p\geq 1$, you'll find this restriction in every proof of the metric properties of Minkowski norms.