Solved – a good clustering fitness metric for DBSCAN

clusteringdbscan

Usually, my go-to goodness of fit for evaluating clustering (e.g., k-means) is the average silhouette.

However, for DBSCAN it doesn't work since there are lots of non-clustered points. So increasing the stringency of the parameters removes neighboring clusters, and therefore increases the difference between intra- and inter-cluster distance — just because there are less points clustered.

What is a good clustering method for DBSCAN that takes into account this issue or is robust to number of points clusters?

Best Answer

The only measure that I know that takes noise points and density into account is DBCV:

Moulavi, D., Jaskowiak, P. A., Campello, R. J., Zimek, A., & Sander, J. (2014, April). Density-based clustering validation. In Proceedings of the 2014 SIAM International Conference on Data Mining (pp. 839-847). Society for Industrial and Applied Mathematics.

I haven't used it though.

Best Answer

Related Solutions

Solved – Choosing the number of clusters in hierarchical agglomerative clustering

Solved – Is (a) multicollinearity and/or (b) binary variables an issue for DBSCAN? if so, how can one correct for these issues

Related Question