Solved – When should I use k-means instead of Spectral Clustering

clusteringdata miningk-meansmachine learningspectral analysis

From the image linked to below, it looks like when the data actually consists of K isotropic clusters, Spectral Clustering does as well as K-means. But for other, non-convex clusters, Spectral Clustering outperforms k-means. Is this true? When should I use K-Means clustering instead of Spectral Clustering?

Also, to find clusters of the forms shown in rows 1 and 2, what similarity function do I need to use in conjunction with SpectralClustering?

Scikit learn's clustering method comparisons

Best Answer

k-means is much much much faster.

K-means is hard to beat performance wise, so it will work on larger data sets. That is probably the key factor.

K-means is $O(n.k.d.i)$, i.e. linear.

For large data sets, anything of $O(n^2)$ or worse is prohibitive.

Spectral clustering is in $O(n^3)$.

Which means it won't work for any reasonably large data set. It took already 7 seconds on a strong CPU for that second image - don't try this on larger data, you will not be happy.

P.S. that image is outdated. The current version can be found in the sklearn documentation (not embedding, as I don't know if the image is CC-BY-SA-3.0 licenseable or not... your image upload may be violating copyright, although I doubt you'll get into trouble ...)

Note the runtime information. k-means and DBSCAN take <0.02s on each of these tiny toy data sets, whereas spectral clustering is 23-734 times slower. Only affinity propagation is similarly bad.

Related Question