Solved – implied by standard deviation being much larger than the mean

distributionsstandard deviation

What does it imply for standard deviation being more than twice the mean? Our data is timing data from event durations and so strictly positive. (Sometimes very small negatives show up due to clock resolution issues). We are accustomed to the following table (locally developed):

stdev / mean <= .5 : treat as normal distribution
stdev / mean >= .5 <= .75 : usually normal but might be exponential
stdev / mean >= .75 <= 2 : exponential / poisson
stdev / mean >= 2 : outside inhibitors dominate

In this case we got a ratio of 10 7 and outside inhibitors (meaningless external variables) are eliminated.

What we're trying to do is get some kind of estimate on whether the fat-tail is going to kill the estimate of the mean. The default model is noise applied to a constant time from an effectively constant load distribution, which we reject and replace with an exponential model of load distribution when the stdev gets too large. Observationally, we know that breakers almost always appear in the exponential distributions due to variables we cannot account for.

And then this thing popped up. We eliminated all external variables and still it remains. Our theoretical model for this case says it should be bi-modal normal (that is, the weighted sum of two normals) but this doesn't look like it. If it weren't for the fact we're reasonably confident we've seen the largest datapoint at just over 8 standard deviations away from the mean I'd think we haven't reached the second hump of the bi-modal distribution yet. Incidentally we have the median which is 13 times smaller than the mean.

For the fast answer, the plot does not exist because the mean and standard deviation are dominated by single outliers separated by more than the mean's value. If I set my histogram based on the median, I lose the important part of the graph off the right. If I set my histogram based on the mean, I blow the left-most bar off the top of the graph and the right-hand is still indistinguishable from noise because no histogram bar on the right is > 1.

Best Answer

Absolutely nothing.

Even in the case when you are dealing with normal distributions, these are examples of a location-scale family of distributions which means I can choose the center (mean) and spread (SD) to be anything I want it to be.

A normal probability model is a poor choice for modeling time-to-event outcomes. If the probability model is exponential, the variance is related to the square of the mean, so with an SD greater than the mean, we can infer that there is some evidence the mean is greater than "1" on whichever units you have used to measure the outcome. But that is purely ad hoc: you would do better to use maximum likelihood to estimate characteristics of the survival times directly, rather than make broad inferences.

In the case of one-sample hypothesis testing where your hypothesis is that the mean is 0, we can say a bit more. The standard deviation of the data is related to the standard error of the sample mean by the Central Limit Theorem: $SE = SD / \sqrt{n}$.

If you mean to say that the sample mean is less than 2 times the standard error, normal probability laws tell us that there is little evidence to support that the mean is nonzero.

Related Question