Solved – A measure that corresponds to variance that is robust against outliers

descriptive statisticsoutliersvariance

One theory holds that the increase in variance over time in a particular measure (let's call it $x$) should be linear.

I've collected a dataset in order to test this claim of the theory. What I've found, however, is that $x$ is almost always very small, but is occasionally quite large. The outliers in my dataset (the rare large values of $x$) seem to be causing fluctuations in the variance over time (independent datasets were collected for each time point) that make it difficult to see a clear pattern of increase.

What would statisticians recommend I do to reduce my analysis' vulnerability to outliers, while still addressing the claim of the theory I'm interested in?

One solution to reduce the influence of outliers would be to compute something like the interquartile range. But then, would I be looking for a linear increase in the interquartile range? (It seems to me the answer is no, but I'm not sure what to do about this.)

Another solution might be to fit a Gaussian distribution to my data, and derive the variance from the best-fitting width parameter. But this makes the assumption that my data are distributed normally.

Maybe the easiest thing to do would be to use some threshold for cutting outliers entirely from the dataset. But this seems like cheating to me. (Maybe it isn't cheating.)

Best Answer

The median absolute deviation is one generally accepted measure of the spread of data points, robust in the sense that it is insensitive to the exact values of outliers unless outliers represent over half of the observations. This is a very useful alternative to variance/standard deviation in cases like yours. There are also additional robust measures of the spread (scale) of observations; see the references in the linked page for further information.

Related Question