I have samples that are distributed like this:
I want to calculate the standard deviation (or similar) of the main peak without the outliers. Of course I can do this just applying a cut at, say, -5ยต. I am wondering, however, if there is a better way using another statistic that automatically gets rid of the outliers and returns a value similar to the $\sigma$ o the main peak.
In analogy to this, when I need to calculate the mean of something like this I use the median instead of the mean. For this data specifically the median is 1.58e-6 while the mean is -8.63e-6. As seen in the following zoomed plot, the median is a very good estimation of the mean of the "Gaussian":
I am looking for something similar but for the standard deviation instead.
The good thing about the "median trick for the mean" is that it does not matter if there is one point at 99999, as it is a single point it will get rid of it. For the standard deviation, if I apply a cut at $\alpha \sigma$ with $\alpha$ some value (say 2) the problem is that a value at 99999 will make this to fail.
Best Answer
I would suggest the Median Absolute Deviation/Error metric (MdAE).
Consider the following made-up data (example in R)
with one big outlier, and consider the following metrics
results on the original data
results on the data after removing the outlier