Solved – Implementations of clustering with asymmetrical distance/similarity matrix

clusteringmethodology

In my clustering problem I'm working with custom similarity measure and looking for any implementation of algorithms with asymmetrical distance or similarity matrix. I'm only interested in those that can offer custom similarity/distance matrix as input or a custom similarity measure function. Language of implementation doesn't really matter as long as it's working. (R would be the best, Python, C++, C, Java, Ruby)

What methods I find particularly interested in.

  1. Taylor-Butina clustering/grouping for asymmetrical similarity. Here is the usage of it. Some R implementations, though not sure if it's usable with asymmetrical matrix.
  2. Tarjan's hierarchical clustering with strong components. Not to be confused strongly connected components algorithm.
  3. Spectral clustering using custom affinity matrix. I'm especially interested in implementation of this one in R.

I found some of the algorithms implemented in Mesa Suite Version 2.0 Grouping Module. However I haven't found any trials or downloads of this program.

Edit : The distance function satisfies triangle inequality.

I think this post would be interesting for all cheminformatics people using R.

Best Answer

DBSCAN and OPTICS should work with asymmetric distances as well.

Don't use the DBSCAN implementation in R. It's incredibly slow.

The version in scipy probably doesn't allow you to use arbitrary distances, and it computes the full distance matrix, so it scales badly.

The DBSCAN and OPTICS implementations in ELKI are pretty good, and scale really well when you can use index structures.

Related Question