Standard Deviation – Estimator Without Outliers

medianoutliersstandard deviation

I have samples that are distributed like this:

enter image description here

I want to calculate the standard deviation (or similar) of the main peak without the outliers. Of course I can do this just applying a cut at, say, -5ยต. I am wondering, however, if there is a better way using another statistic that automatically gets rid of the outliers and returns a value similar to the $\sigma$ o the main peak.

In analogy to this, when I need to calculate the mean of something like this I use the median instead of the mean. For this data specifically the median is 1.58e-6 while the mean is -8.63e-6. As seen in the following zoomed plot, the median is a very good estimation of the mean of the "Gaussian":

enter image description here

I am looking for something similar but for the standard deviation instead.

The good thing about the "median trick for the mean" is that it does not matter if there is one point at 99999, as it is a single point it will get rid of it. For the standard deviation, if I apply a cut at $\alpha \sigma$ with $\alpha$ some value (say 2) the problem is that a value at 99999 will make this to fail.

Best Answer

I would suggest the Median Absolute Deviation/Error metric (MdAE).

Consider the following made-up data (example in R)

x=c(4,7,2,4,6,4,8,5,4,6,1000)

with one big outlier, and consider the following metrics

c("mean"=mean(x),
  "median"=median(x),
  "MSD"=sd(x),
  "MdSD"=sqrt(median((x-mean(x))^2)),
  "MdMdSD"=sqrt(median((x-median(x))^2)),
  "MAD"=sum(abs(x-median(x)))/(length(x)-1),
  "MdAD"=median(abs(x-median(x))),
  "MADR"=mad(x)
)

results on the original data

  mean median    MSD   MdSD MdMdSD    MAD   MdAD   MADR 
 95.45   5.00 300.01  91.45   1.00 100.90   1.00   1.48 

results on the data after removing the outlier

  mean median    MSD   MdSD MdMdSD    MAD   MdAD   MADR 
  5.00   4.50   1.76   1.00   1.12   1.56   1.00   1.48