Solved – Clustering with asymmetrical distance measures

clusteringdistance

How do you cluster a feature with an asymmetrical distance measure?

For example, let's say you are clustering a dataset with days of the week as a feature – the distance from Monday to Friday is not the same as the distance from Friday to Monday.

How do you incorporate this into the clustering algorithm's distance measure?

Best Answer

If the M-F distance is asymmetric because the future is different from the past, then a genuine asymmetric clustering is called for. First, an asymmetric distance function must be defined.

One way to to asymmetric clustering, given a distance function, is to embed the original data into a new coordinate space. See "Geometrical Structures of Some Non-Distance Models for Asymmetric MDS" by Naohito Chino and Kenichi Shiraiwa, Behaviormetrika, 1992 (pdf). This is called HCM (the Hermitian Canonical Model).

Find a Hermitian matrix $H$, where $$ H_{ij} = \frac 1 2 [d(x_i, x_j) + d(x_j, x_i)] + i \frac 1 2 [d(x_i, x_j) - d(x_j, x_i)] $$ Find the eigenvalues and eigenvectors, then scale each eigenvector by the square root of its corresponding eigenvalue.

This transforms the data into a space of complex numbers. Once the data is embedded, the distance between objects x and y is just x * y, where * is the conjugate transpose. At this point you can run k-means on the complex vectors.

Spectral asymmetric clustering has also been done, see the thesis by Stefan Emilov Atev, "Using Asymmetry in the Spectral Clustering of Trajectories," University of Minnesota, 2011, which gives MATLAB code for a special algorithm.