Measure for deviation/error that is guarantueed to be smaller than the mean

data visualizationerrormean-absolute-deviationstandard deviation

I am presenting some non-negative values by using a mean value with error bars, basically something like this:

enter image description here

So the mean values are shown as dots and the error bars are displayed vertically, centered around the dots. This means that the value of the lower error bar is at MEAN-ERROR and the value of the upper error bar is at MEAN+ERROR.

A candidate for the error measure would be the standard deviation. However, this would make it possible that the lowest error bar is below zero (Since it is possible that MEAN < STANDARD DEVIATION), which might be counter-intuitive.

Is there some error measure which is guaranteed to be smaller than the mean value, and therefore never has error bars below zero?

Due to some external requirements, I can't use median/quartile based measurement. However, if needed, something directed (with a negative/positive error) can be used.

Best Answer

So, you know that the true values of this unknown value must be greater than zero? This seems like the perfect time to incorporate a prior.

Let's call the random variable $X$. Since we know $X$ cannot be less than zero, you are correct in saying that reporting a value of $X$ as, for example, $X=0.23 \pm 0.44$, does not make much sense. I think in this case, it's perfectly acceptable to report the error as $X=0.23^{+0.44}_{-0.23}$. It's not under-reporting the error; it's incorporating real knowledge we have about the distribution we are studying.

There are a couple of other ways to approach this problem, too. Personally, I like looking for a $68\%$ (or $95\%$) confidence interval - but if your external requirements don't allow that, that's fine.

Another trick you can use is to transform your data into log-space first, then compute the mean and standard deviation of $\log(X)$ before transforming back. This will ensure that the upper- and lower- estimates of $X$ are both greater than zero, and it is the more sensible thing to do if it looks like your data follow a log-normal distribution.

To summarise, there are a lot of ways to solve this problem, including:

  • Taking a Bayesian approach, and incorporating the knowledge that $X>0$;
  • Using a robust error measurement method, such as a $68\%$ confidence interval;
  • Transforming your data into log-space to ensure non-negativity.

I recommend looking into each of these methods in more detail, and deciding which one is best for the problem you have at hand. Have fun!