I have around 15-20 points every second, and I would like to detect outliers based on
-their density along x-axis , that means if I am using k-mean clustering then I specify that in x-direction max of a variance is tolerated and in y max of b is tolerated. and a << b for this case.
Hence I get elliptical clusters.
Moreover, I think that clustering algorithms based on density clustering will suit my problem or if you recommend any other please suggest.
In the following figure, the most I am interested in the points which are near to -10 at x-axis and I would like to retain them rest the ones >-12. I want to discard them, but I would need a general clustering method to do it for all the data, as the point (-10) in this specific case, changes from season to season.
Thank you for your time, and please let me know if any further info is required or you have any tips . It would be most welcome
Best Answer
Why do you insist on using clustering?
To me it sounds as if simple k-nearest-neighbor distance (and in fact, nearest-neighbor distance) should just work for you. And that is a simple as it can get.
Note that it is trivial to add weights $\Omega=\{\omega_1,\ldots,\omega_d\}$ into Euclidean distance:
$$ \text{Euclidean}_\Omega(x,y) := \sqrt{\sum_{i=1}^{d} \omega_i(x_i-y_i)^2} $$