Solved – What are the advantages of Louvain method versus K-means for clustering sparse data

clusteringk-meansmodularity

I would like to better understand the strengths of the Louvain method versus K-means for high-dimensional sparse data (e.g. zero-inflated negative binomial gene expression counts or natural language processing matrices).

A common procedure is to reduce dimensionality with PCA and then cluster on principal component space. In this context, what is the main value of the Louvain method versus K-means?

From How to understand the drawbacks of K-means, aside of the obvious advantage of not relying on the assumption of K number of clusters (albeit, the Louvain method relies on parameters like the number of relevant nearest-neighbors to build a graph), I conclude that the Louvain method, on the contrary, does not assume equal size, density or shape of the clusters.

Is this intuition correct?

Best Answer

Your intuition is correct. The Louvain method is a-parametric, and requires no prior assumptions on the graph.
However, the main difference is thet K-means (and most others) work on data points embedded in some space, while Louvain works on data points connected by a graph. Now, if you have points in some space and want to create a graph out of them - the graph itself might be composed through some heuristic or assumption.

If a graph representation is already a 'natural' representation for your data (maybe gene expression counts have a clear correlation network structure?), the Louvain method is appropriate.

If not, you might want to look at, for example DBScan, which also doesn't require the number of clusters (but does require some other parameters).