Solved – What clustering algorithm can be used with a distance matrix and without feaures

clustering

I have a dataset of binary files. I can't do feature extraction on them. I just computed the distance between every pair of file in the dataset with a distance metric (NCD = Normalized Compression Distance). So I have a distance matrix.

My goal is to cluster these files. What is the best way to do that?

Best Answer

Many, many algorithms are based on distances only:

  • hierarchical clustering, with most linkages (single-link etc.)
  • DBSCAN
  • OPTICS
  • PAM (Partitioning around Medoids, aka k-medoids)
  • Affinity propagation

Of course there are also a number of methods that need coordinates. In particular

  • Centroid-based methods such as k-means need coordinates to compute the centroid
  • Grid-based methods such as DENCLUE need coordinates to compute a grid
Related Question