Solved – implied by standard deviation being much larger than the mean

distributionsstandard deviation

What does it imply for standard deviation being more than twice the mean? Our data is timing data from event durations and so strictly positive. (Sometimes very small negatives show up due to clock resolution issues). We are accustomed to the following table (locally developed):

stdev / mean <= .5 : treat as normal distribution
stdev / mean >= .5 <= .75 : usually normal but might be exponential
stdev / mean >= .75 <= 2 : exponential / poisson
stdev / mean >= 2 : outside inhibitors dominate

In this case we got a ratio of 10 7 and outside inhibitors (meaningless external variables) are eliminated.

What we're trying to do is get some kind of estimate on whether the fat-tail is going to kill the estimate of the mean. The default model is noise applied to a constant time from an effectively constant load distribution, which we reject and replace with an exponential model of load distribution when the stdev gets too large. Observationally, we know that breakers almost always appear in the exponential distributions due to variables we cannot account for.

And then this thing popped up. We eliminated all external variables and still it remains. Our theoretical model for this case says it should be bi-modal normal (that is, the weighted sum of two normals) but this doesn't look like it. If it weren't for the fact we're reasonably confident we've seen the largest datapoint at just over 8 standard deviations away from the mean I'd think we haven't reached the second hump of the bi-modal distribution yet. Incidentally we have the median which is 13 times smaller than the mean.

For the fast answer, the plot does not exist because the mean and standard deviation are dominated by single outliers separated by more than the mean's value. If I set my histogram based on the median, I lose the important part of the graph off the right. If I set my histogram based on the mean, I blow the left-most bar off the top of the graph and the right-hand is still indistinguishable from noise because no histogram bar on the right is > 1.

Best Answer

Absolutely nothing.

Even in the case when you are dealing with normal distributions, these are examples of a location-scale family of distributions which means I can choose the center (mean) and spread (SD) to be anything I want it to be.

A normal probability model is a poor choice for modeling time-to-event outcomes. If the probability model is exponential, the variance is related to the square of the mean, so with an SD greater than the mean, we can infer that there is some evidence the mean is greater than "1" on whichever units you have used to measure the outcome. But that is purely ad hoc: you would do better to use maximum likelihood to estimate characteristics of the survival times directly, rather than make broad inferences.

In the case of one-sample hypothesis testing where your hypothesis is that the mean is 0, we can say a bit more. The standard deviation of the data is related to the standard error of the sample mean by the Central Limit Theorem: $SE = SD / \sqrt{n}$.

If you mean to say that the sample mean is less than 2 times the standard error, normal probability laws tell us that there is little evidence to support that the mean is nonzero.

Related Solutions

Solved – What percentage of the students scored more than one standard deviation above the mean

Why are you using the normality assumption? You do not know the distribution of scores in the sample. So, given a dataset (let us denote it with s, a vector of the student scores), the following routine will give you the exact result for any distribution (below is the implementation in R):

$$ \boldsymbol{s} = (s_1, \ldots, s_n), \quad\mathrm{ans} = \frac{\#\left\{s_i\colon s_i > \left( \bar{\boldsymbol{s}} + \sqrt{\frac{1}{n-1} (\boldsymbol{s} - \bar{\boldsymbol{s}})' (\boldsymbol{s} - \bar{\boldsymbol{s}}}) \right)\right\}}{n} \cdot 100\% $$ where $\bar{\boldsymbol{s}} = \frac{1}{n} \sum s_i$ is the arithmetic mean and $\#\{\cdot\}$ just counts the elements of a set that satisfy the condition.

sum(s > mean(s) + sd(s)) / length(s) * 100

This thing does exactly what it says on the tin: s > mean(s) + sd(s) returns TRUE for those guys who were above one SD, sum counts them (TRUE is converted to 1 and FALSE to 0), and then you compute the percentage.

Solved – QR ever be larger than standard deviation

Your argument about $68\%$ of the density contained within $1$ standard deviation (of the mean) is true for the Normal distribution, but not in general. There are examples where the standard deviation exceeds the IQR and examples in the other direction as well.

Let $X$ have a t-distribution with $v$ degrees of freedom.

The standard deviation of $X$ is $$\sigma = \sqrt{\text{Var}(X)} = \sqrt{\frac{v}{v-2}}$$

The IQR can be found using the quantile function of the t-distribution. In R, we have

qt(.75, v) - qt(.25, v)

We have that $\sigma = IQR$ for $v\approx 3.61$. When $v < 3.61$, we have $\sigma > IQR$ and we have $\sigma < IQR$ otherwise. Since a normal distribution can be achieved in the limit as $v\rightarrow \infty$, this demonstrates that $\sigma < IQR$ for a normal distribution.

Sal considers the interesting question, can IQR be larger than $2\sigma$?. Indeed, this will occur for the symmetric Beta distribution when the shape parameter is small.

Let $X \sim Beta(\alpha, \alpha)$. Then $IQR = 2\sigma$ when $\alpha = 0.5$, $IQR > 2\sigma$ when $\alpha < 0.5$ and $IQR < 2\sigma$ otherwise.

We have established examples, where $IQR > \sigma$ and examples where $IQR > 2\sigma$. It is not possible, however, for $IQR$ to exceed $4\sigma$ when the distribution of $X$ is symmetric about its mean $\mu$. Applying Chebyshev's Inequality, we have $$P(|X-\mu| \leq 2\sigma) \geq \frac{3}{4}.$$

If the density function of $X$ is symmetric about $\mu$, we have $$P(|X-\mu| \leq \text{IQR}/2) = \frac{3}{4}.$$

This implies that $2\sigma \geq \text{IQR}/2$, or equivalently $$\text{IQR} \leq 4\sigma$$

Best Answer

Related Solutions

Solved – What percentage of the students scored more than one standard deviation above the mean

Solved – QR ever be larger than standard deviation

Related Question