Solved – Best clustering technique for outlier detection

clusteringk-meansoutliers

I have around 15-20 points every second, and I would like to detect outliers based on

-their density along x-axis , that means if I am using k-mean clustering then I specify that in x-direction max of a variance is tolerated and in y max of b is tolerated. and a << b for this case.

Hence I get elliptical clusters.

Moreover, I think that clustering algorithms based on density clustering will suit my problem or if you recommend any other please suggest.

In the following figure, the most I am interested in the points which are near to -10 at x-axis and I would like to retain them rest the ones >-12. I want to discard them, but I would need a general clustering method to do it for all the data, as the point (-10) in this specific case, changes from season to season.

Thank you for your time, and please let me know if any further info is required or you have any tips . It would be most welcome

Plot 2
Plot 1

Best Answer

Why do you insist on using clustering?

To me it sounds as if simple k-nearest-neighbor distance (and in fact, nearest-neighbor distance) should just work for you. And that is a simple as it can get.

Note that it is trivial to add weights $\Omega=\{\omega_1,\ldots,\omega_d\}$ into Euclidean distance:

$$ \text{Euclidean}_\Omega(x,y) := \sqrt{\sum_{i=1}^{d} \omega_i(x_i-y_i)^2} $$

Related Question