Normal distribution sample

gaussian-integralnormal distributionprobability distributionsstatistics

Since I'am beginner in statistics I'm stuck in simple exercise so will appreciate any help. I have mean, standard deviation and probability p(x) and need to get x. Here is the Exercise

The patient recovery time from a particular surgical procedure is
normally distributed with a mean of 5.3 days and a standard deviation
of 2.1 days

The 90th percentile for recovery times is?

I know that it's possible to get x from probability formula but I was wondering if there is easier way to get it.

Best Answer

Look up a table for standard normal distribution. The random variable $X$ is distributed normally. Let $Z = \frac{X-\mu}{\sigma}$ be the standardisation. Then we're interested in $$ \mathbb P\left (Z\leq \frac{t_0-\mu}{\sigma}\right ) = 0.9 $$ The table can be used to find the closest desirable value and one can then solve for $t_0$.

Also, verify quickly what happens when percentile increases or decreases. Or how affecting mean/st deviation changes result.

Here is a flexible table and graph for standard normal distribution to try

Related Solutions

[Math] Calculating percentile value from mean and standard deviation of a normal distribution

To answer the question from your title "Calculating percentile value from mean and standard deviation of a normal distribution":

In practice one can do that (i.e. computing the normal cumulative distribution function $\Phi$) by converting the raw value to a Z-score (subtract the mean, then divide by std-dev) and then using a lookup table (sometimes called a Z-table) to convert the Z-score to percentile (well, to probability, for percentile multiply that by 100). Wikipedia has both the table(s) and examples how to use them.

If one needs more precision than a lookup table would provide there are some numerical algorithms that can compute that. The one in R's pnorm is based on

Cody, W. D. (1993) Algorithm 715: SPECFUN – A portable FORTRAN package of special function routines and test drivers. ACM Transactions on Mathematical Software 19, 22–32.

There are numerous others by relying on the simple transformation from $\Phi$ to the error function (erf), for which one can find numerous approximations. The paper by Soranzo and Epure (see next section) also gives an approximation formula directly as $$ \Phi(x) \approx 2^{-22^{1-41^{x/10}}} $$

Or more legible: 2**(-22**(1-41**(x/10))). Note this relies on the symmetry $\Phi(-x) = 1-\Phi(x)$ to extend it over negative arguments while preserving low error.

In the body of your question you are asking the opposite problem: "is it possible to determine the x value of the 95th percentile?" That's possible too, in general that's called the inverse cumulative cumulative or more succinctly quantile function, but for the normal distribution that function is just called probit, so that's the shortest word-like name for $\Phi^{-1}$. In R probit is implemented in qnorm. The numerical implementation of that in R is based on

Wichura, M. J. (1988) Algorithm AS 241: The percentage points of the normal distribution. Applied Statistics, 37, 477–484.

Besides that, the probit has a simple algebraic formula that relates it to the inverse error function. And there are some approximation formulas for the latter as well, e.g.

$$\operatorname{erf}^{-1}(x) \approx \operatorname{sgn}(x) \sqrt{ \sqrt{\left(\frac{2}{\pi a} + \frac{\ln(1 - x^2)}{2}\right)^2 - \frac{\ln(1 - x^2)}{a}} - \left(\frac{2}{\pi a} + \frac{\ln(1 - x^2)}{2}\right) }. $$ where

$$ a = \frac{8(\pi - 3)}{3\pi(4 - \pi)} \approx 0.140012.$$

Then:

$$\operatorname{probit}(p) = \sqrt{2}\,\operatorname{erf}^{-1}(2p-1).$$

If it needs spelling out, probit will give you the z-score from the probability $p$ (percentile divided by 100). To convert the z-score to your "x" you need to then apply the opposite of the z-score transformation, i.e. multiply by std-dev and then add the mean.

If you don't care much about accuracy, you can go old school and approximate the probit by logit, e.g. compute it as

$$\operatorname{probit}(p) \approx \sqrt{\frac{\pi}{8}}\,\ \ln\left( \frac{p}{1-p} \right).$$

The latter approximation gets pretty bad as $p$ gets high or low (i.e. it's best around 0.5).

Another good approximation for probit from a recent paper by Soranzo and Epure (2014) is

$$\operatorname{probit}(p) \approx \frac{10}{\ln 41}\, \ln \left(1- \frac{\ln \frac{-\ln p}{\ln 2}}{\ln 22} \right) $$

This has low error for $p \ge 0.5$, but one can use the symmetry $ \operatorname{probit}(1-p) = -\operatorname{probit}(p) $ for $p$ below 0.5.

[Math] Calculate Percentile of Skewed Dataset

The straightforward way is to use the definition of percentile (which differs a bit from text to text and software to software) and count observations. This works for data from any distribution. (Differences in definitions do not matter much for large samples.) Roughly speaking, the 90th percentile is a value below which one finds not more than 90% of the observations, and above which one finds not more than 10% of them.

Here are a few examples of percentiles for a couple of datasets in R, one normal (symmetrical) and one exponential (right-skewed). Notice that percentiles of small samples do not necessarily match percentiles of the populations from which they are sampled. (The method you have been using for normal data seems to conflate the two kinds of percentiles.) In the data displays below the numbers in brackets give the index of the first observation in the that row.

x = round(sort(rnorm(50, 100, 15)), 1);  x  # generate 50 obs from Norm mean=100, SD=15
x
 [1]  61.1  69.4  71.1  73.0  73.9  77.5  78.0  78.0  79.0  81.5
[11]  83.4  85.9  86.5  87.8  87.9  88.0  88.8  90.0  90.7  91.3
[21]  92.0  93.0  95.3  97.9  97.9  99.2  99.2 100.0 100.2 101.0
[31] 102.9 103.4 104.3 104.6 105.4 107.2 108.5 108.6 109.5 109.6
[41] 111.3 111.5 111.9 118.0 119.5 119.6 119.6 119.9 121.5 128.4
quantile(x, .9)  # 90th percentile
   90% 
119.51 
quantile(x, .7)  # 70th percentile
   70% 
105.94 
qnorm(c(.9, .7), 100, 15)  # 90th and 70th percentiles of POPULATION
[1] 119.2233 107.8660

x = round(sort(rexp(60, rate=1/50)), 1);  x  # generate 60 obs from EXP mean=50
x
 [1]   0.0   0.4   0.5   0.9   0.9   1.3   2.4   3.8   4.4   6.1
[11]   7.5   7.9   8.0   9.0   9.9  11.0  11.5  13.3  15.4  16.4
[21]  19.6  20.3  25.1  25.4  28.0  28.8  29.3  29.5  31.1  32.0
[31]  32.1  34.2  37.2  40.6  42.0  42.0  49.5  52.2  55.6  56.8
[41]  59.6  64.3  73.9  74.7  78.7  87.9  90.1  95.4  97.2 105.2
[51] 110.3 113.8 114.6 172.5 187.0 188.2 188.8 207.9 259.7 265.3
quantile(x, .9)
   90% 
173.95 
quantile(x, .7)
  70% 
67.18 
qexp(c(.9, .7), rate=1/50)  # 90th and 70th percentiles of POPULATION
115.12925  60.19864

Best Answer

Related Solutions

[Math] Calculating percentile value from mean and standard deviation of a normal distribution

[Math] Calculate Percentile of Skewed Dataset

Related Question