Normal Distribution – Finding Gaussian Distribution Given 15th and 50th Percentile

normal distributionquantiles

My knowledge of statistics are small so I need a little bit of help.

I have some data showing the relationship between age and walking speed. The numbers of walking speed are presented only as the 15th and the 50th percentile.
For example: Child(0-12 years): 1.07m/s (15th percentile) 1.33m/s (50th percentile).

I need to find how many percent is between 1.0m/s – 1.2m/s and 1.2m/s – 1.4 m/s etc.
That is the likelihood of observing a child walking between 1.0 and 1.2 m/s etc.

Is that possible? Please help me out.

Best Answer

This is not possible unless you know (or assume) the shape of the distribution. I will illustrate for the case of a Normal distribution family. The same procedure works for almost any shape of distribution, provided a unique distribution in the family can be determined by just two percentiles. (In practice this usually means the family uses at most two parameters.)

The first step is to draw the probability density function (PDF) for the distribution. This method uses areas to represent probabilities. In the present case the information available to us means that $15$% of the area must lie to the left of $1.07$ m/s and $50$% of the area must lie to the left of $1.33$ m/s. (Subtracting, we deduce $35 = 50-15$% of the area lies between $1.07$ and $1.33$.)

Figure 1: Normal distribution with highlighted areas

The next step is to look up or compute the values of these percentiles for a standard version of the distribution: this is a distribution of the desired shape (such as Normal) whose percentiles have been tabulated or can be computed. (For mathematical convenience, it frequently has a mean of zero and a unit standard deviation.) In the example we find that the $15$th percentile of the standard Normal distribution is near $-1.04$ and its $50$th percentile is $0$ (exactly). Label the x-axis both with the standard values and with your values:

Figure 2: Normal distribution with labeled percentiles

Because the x-axis has been labeled with two distinct values that are relevant to you, you can now figure out (through linear interpolation and extrapolation) precisely where to label it with any other values. For instance, suppose you are interested in computing the area between $1.6$ m/s and $1.8$ m/s. Find the locations of those labels and--via linear extrapolation--determine the corresponding standard values:

Figure 3: Normal distribution with new labeled percentiles

As an example of the calculation, $1.6$ is $0.27$ greater than $1.33$ and $1.33$ is $0.26$ greater than $1.07$. Because $1.33$ is $1.04$ greater than $1.07$ in the units of the standard distribution, then also $1.6$ should be proportionately greater than $1.33$ in the standard units: that is, it should be exactly $0.27/0.26$ times $1.04$ greater than $0$: that's where the $1.08$ comes from in the figure. The value of $1.87$ is determined in the same way.

Once again, using tables or calculations of the standard distribution (or even a visual estimate), you can determine the area between the values of interest: in this case it is near $11$%.

This pictorial method is useful not only for guiding the calculations, but is helpful for reasoning semi-quantitatively about the distribution. For instance, it should now be apparent that if walking speeds are indeed (at least approximately) normally distributed, then very few will exceed $2.1$ m/sec (that would be out where the right tail appears to meet the x-axis, suggesting a very small area) or be less than $0.55$ m/sec (that would be where the left tail meets the x-axis).

Related Question