Solved – Outliers detection for clustering methods

clusteringdata preprocessingk-meansoutliersself organizing maps

I'm in the middle of a result analysis for some clustering methods, doing quality tests for different clustering outputs coming from a singular input dataset where data preprocessing and cleaning methods are swapped.

So far, the clustering outputs from dataset where any outlier detection technique has been applied show a poor performance. Hence, I was wondering whether it's worth at all applying an outlier detection technique for clustering. My particular results say it isn't, but I'd like to know your opinions from a wider perspective.

If needed, the clustering methods used are: K-means, SOM maps and hierarchical clustering. Thanks!!

Best Answer

It really depends on your data, the clustering algorithm you use, and your outlier detection method. Consider the K-means algorithm. If your dataset has ``outliers", then the outliers can affect the result of clustering by shifting the cluster centers. Be careful to not mix outlier with noisy data points. Noise is a random effect on data and can appear in all directions. Outliers are single, mostly isolated data points that are far from the rest of the data.

If you do not have outliers, outlier detection can hurt your data by removing small clusters or removing only a part of a scattered noise.

Related Question