Solved – Anomaly detection on 1D data with multiple gaussian distributions

anomaly detectiongaussian mixture distributionnormal distributionoutliers

My core problem is to set a cutoff to my one dimension data between normal with abnormal. I think this is a 'anomaly detection' problem.

My Data

My data is one dimension, consists with below:

  1. (Normal) m different gaussian distributions with small mu, m = 1~3 in most case. Fig.1 shows two normal gaussian distributions as red color.
  2. (Abnormal) n different unknown distributions(mainly gaussian i think) with large mu, n is unknown. They can't see in Fig.1 because sometimes they are too small, maybe only few points.
  3. (Abnormal) Noise points.

Fig.1 TLEN distribution in one sample

I want to set a cutoff (on x axis) to separate normal and abnormal.

My Solutions

After searching around and asking around, my solutions comes below:

Find all normal data gaussian distributions and use the max mu + 3 sigma value as cutoff(3 sigma rules).

  1. Firstly, i use some outlier detection methods to remove most abnormal points, then the rest data is mainly normal.
  2. Then use KDE recognize how many peaks the rest data has. And use this value as the number of gaussian distributions in normal data.
  3. At last, use GMM to fit the rest data, and get the max mu + 3 sigma value.

My problems mainly at step 1. It doesn't work well. I have tried

a) LOF (not stable, sometime it recognizes 99% points as outlier).
b) DBSCAN (hardly to cluster normal data as few clusters).
c) GMM, of course (use 2 gaussian distributions to separate normal and abnormal. It works well when m = 1, but other cases failed).

Best Answer

1) Can't you use classification to determine Abnormality and normal data?
2) LOF gives local outlier factor value and dependent on number of neighbours what you can given. but still u can select only top 10 or 100 observations as outlier. There should not be a case where 99% data points are outlier. increase LOF from 2 to may be 5.
3) There are many other unsupervised techniques for your step 1 -https://machinelearningstories.blogspot.com/2018/07/anomaly-detection-anomaly-detection-by.html I can share some more material if you require.