Is there any method/eqn to estimate the 3-sigma limit of a skewed normal distribution, if the mean/variance/skewness is already known? Thanks!
[Math] estimate 3-sigma limit of a skewed normal distribution with known mean/variance/skewness
statistics
Related Solutions
According to page 3 here the Jeffrey's prior is $P[\mu] = c\sqrt{\frac{n}{\sigma^2}}$, which is still a constant so it doesn't matter.
Noting that $P[\bar{X}|\mu] = \frac{1}{\sqrt{2\pi\frac{\sigma^2}{n}}}\textrm{Exp}[\frac{(\bar{X}-\mu)^2}{2\frac{\sigma^2}{n}}]$.
Multiplying these two expressions together gives us:
\begin{equation} P[\bar{X},\mu] = \frac{c\sqrt{\frac{n}{\sigma^2}}}{\sqrt{2\pi\frac{\sigma^2}{n}}}\textrm{Exp}[\frac{(\bar{X}-\mu)^2}{2\frac{\sigma^2}{n}}] \end{equation}
Note that all of the stuff outside of the exponential is just a constant, so examining only the exponential kernel, we can see that the expression is a normal kernel for $\mu | \bar{X} \sim \textrm{N}(\bar{X},\frac{\sigma^2}{n})$.
So this is our posterior distribution.
The Wiki page Skew normal distribution provides the information to estimate the parameters using the sample mean ($\bar{x}$), standard deviation ($s$), and skewness ($\hat{\gamma}$). The 3 parameters to be estimated are $\mu$, $\sigma$, and $\alpha$.
If $|\hat{\gamma}|<1$, then $\hat{\alpha}$ is found in two steps:
$$\delta =\sqrt{\frac{\pi \left| \hat{\gamma} \right| ^{2/3}}{2 \left(\left| \hat{\gamma} \right| ^{2/3}+\left(\frac{4-\pi }{2}\right)^{2/3}\right)}}$$
$$\hat{\alpha} = \text{sgn}(\hat{\gamma})\sqrt{\frac{\delta }{1-\delta ^2}}$$
Otherwise $\hat{\alpha}$ is the solution to
$$\hat{\gamma} =\frac{\sqrt{2} (4-\pi ) \hat{\alpha} ^3}{\left((\pi -2) \hat{\alpha} ^2+\pi \right)^{3/2}}$$
which needs to be performed numerically. Then $\hat{\mu}$ and $\hat{\sigma}$ are
$$\hat{\sigma} =\frac{s}{\sqrt{1-\frac{2 \hat{\alpha} ^2}{\pi \left(\hat{\alpha} ^2+1\right)}}}$$
$$\hat{\mu} =\bar{x}-\frac{\sqrt{\frac{2}{\pi }} \hat{\alpha} \hat{\sigma} }{\sqrt{\hat{\alpha} ^2+1}}$$
Now armed with the parameter estimates, then one can estimate the cumulative distribution function:
$$Pr(X \le x)=\Phi\left(\frac{x\, -\hat{\mu} }{\hat{\sigma} }\right)-2 T\left(\frac{x\, -\hat{\mu} }{\hat{\sigma} },\hat{\alpha} \right)$$
where $T$ is the Owen's T function: $T(x,a)=\frac{\int_0^a \frac{\exp \left(-\left(\left(t^2+1\right) x^2\right)\right)}{2 \left(t^2+1\right)} \, dt}{2 \pi }$ .
Here is an implementation using Mathematica:
I know code should be given as text but in this case because it is unlikely that you have Mathematica (and it would look much messier as text), it should be instructive as to the process.
To estimate the percentage of the distribution no larger than a specified value you'll need to use the cumulative distribution function (CDF) described on the Wiki page. Using Mathematica for values of $X$ being 7.5 and 5.2:
If you have access to the statistical package R, then the sn
package will perform these calculations.
Best Answer
Limits such as $2\sigma$ and $3\sigma$ originated because a normal distribution with parameters $\mu$ and $\sigma$ has about 95% of its probability in the interval $\mu \pm 2\sigma$ and about 99.7% of its probability in $\mu \pm 3\sigma$. In particular, less than 0.00135 or 0.135% of observations will fall above $\mu + 3\sigma.$
I don't believe it makes sense to use 'three sigma limits' for non-normal data because the the above probability statements are not true for non-normal data or distributions, especially not for markedly skewed ones. Thus the purpose of using such 'limits' is not reliably accomplished.
What does seem to make sense is to revert to the fundamental principle. You could approximate a 99.7% probability interval by taking quantiles 0.00135 and 0.99865 of your simulated sample. Or quantile 0.99865 to approximate a value above which 0.135% of observations typically lie.
As an illustration, the chi-squared distribution with 3 degrees of freedom has $\mu = 3$ and $\sigma = \sqrt{6} \approx 2.45.$
The following R code generates 10,000 observations from $\mathsf{Chisq}(3).$ (You can get exactly the same sample in R starting with
set.seed(1234)
, as I did.)Results are as follows: If I find the upper $3\sigma$-limit, I get about 10.348, above which about 1.52% (not 0.135%) of the observations fall. So the $3\sigma$ limit does not give the anticipated protection.
By contrast, the 99.869 quantile of $\mathsf{Chisq}(3)$ is 15.630 (not 10.348). Thus 15.63 is a bound above which 0.135% of observations fall.
I do not know details of your simulated distribution. But perhaps my chi-squared example is not outrageously unfair because $\mathsf{Chisq}(3)$ is the distribution for the sum of squares of three standard normal random variables.
Below is a graph of the density function of $\mathsf{Chisq}(3).$ The vertical red line is located at $\mu \pm 3\sigma$; the vertical blue line cuts about 0.135% of the area from the upper tail of the density. (In an analogous plot of a normal density curve, red and green lines would coincide.)