[Math] estimate 3-sigma limit of a skewed normal distribution with known mean/variance/skewness

statistics

Is there any method/eqn to estimate the 3-sigma limit of a skewed normal distribution, if the mean/variance/skewness is already known? Thanks!

Best Answer

Limits such as $2\sigma$ and $3\sigma$ originated because a normal distribution with parameters $\mu$ and $\sigma$ has about 95% of its probability in the interval $\mu \pm 2\sigma$ and about 99.7% of its probability in $\mu \pm 3\sigma$. In particular, less than 0.00135 or 0.135% of observations will fall above $\mu + 3\sigma.$

I don't believe it makes sense to use 'three sigma limits' for non-normal data because the the above probability statements are not true for non-normal data or distributions, especially not for markedly skewed ones. Thus the purpose of using such 'limits' is not reliably accomplished.

What does seem to make sense is to revert to the fundamental principle. You could approximate a 99.7% probability interval by taking quantiles 0.00135 and 0.99865 of your simulated sample. Or quantile 0.99865 to approximate a value above which 0.135% of observations typically lie.


As an illustration, the chi-squared distribution with 3 degrees of freedom has $\mu = 3$ and $\sigma = \sqrt{6} \approx 2.45.$

The following R code generates 10,000 observations from $\mathsf{Chisq}(3).$ (You can get exactly the same sample in R starting with set.seed(1234), as I did.)

set.seed(1234);  x = rchisq(10^4, 3)
mean(x);  sd(x)
## 3.011572       # aprx 3
## 2.444897       # aprx 2.45
mean(x > 3 + 3*sqrt(6))
## 0.0152         # aprx fraction above '3-sigma limit'
qchisq(.99865, 3)
## 15.6304        # aprx quantile .99865

Results are as follows: If I find the upper $3\sigma$-limit, I get about 10.348, above which about 1.52% (not 0.135%) of the observations fall. So the $3\sigma$ limit does not give the anticipated protection.

By contrast, the 99.869 quantile of $\mathsf{Chisq}(3)$ is 15.630 (not 10.348). Thus 15.63 is a bound above which 0.135% of observations fall.

I do not know details of your simulated distribution. But perhaps my chi-squared example is not outrageously unfair because $\mathsf{Chisq}(3)$ is the distribution for the sum of squares of three standard normal random variables.

Below is a graph of the density function of $\mathsf{Chisq}(3).$ The vertical red line is located at $\mu \pm 3\sigma$; the vertical blue line cuts about 0.135% of the area from the upper tail of the density. (In an analogous plot of a normal density curve, red and green lines would coincide.)

enter image description here