Can someone help me understand the intuition with the formula for finding the kth Percentile

statistics

My textbook just gave the formula without any explanation, here's the text:

If you were to do a little research, you would find several formulas for calculating the kth percentile. Here is one of them.

k = the kth percentile. It may or may not be part of the data.

i = the index (ranking or position of a data value)

n = the total number of data
• Order the data from smallest to largest.

• Calculate $i = \frac{k}{100}(n + 1)$

• If i is an integer, then the kth percentile is the data value in the ith position in the ordered set of data.

• If i is not an integer, then round i up and round i down to the nearest integers. Average the two data values in these two positions in the ordered data set.

Best Answer

One way to think about the above formula would be using the basic definition of how a percentage of anything is computed. $$k = \frac{i}{n+1}100$$ $n+1$ is used here simply as a matter of indexing (there could be a $0^{th}$ percentile.)

Related Solutions

[Math] How to find percentiles of data sets (Even vs odd)

As noted in the other responses, there is disagreement.

In practice, with such a small data set, using percentiles isn't particularly useful. So it's really an artificial test question.

I'd say if the test question allows free-form answer, you should demonstrate that you know the general concept of percentiles, and also that you know it's controversial. The answer then would be "either 113, or 109, or 105, depending on the calculation method chosen."

If the question is multiple choice, then there are a few possibilities:

a) A thoughtful test author will not include more than one of 113/109/105 among the permitted choices. Easy.

b) A dogmatic test author who is your own teacher will have given a specific definition in class and will expect you to follow that definition. Predictable at least.

c) A dogmatic test author who is not your own teacher, and who offers more than one of the three choices, puts you in an unwinnable position. Take your best guess, and if it matters enough, be prepared to appeal.

[Math] the correct way to obtain the position of a quartile and its value

Data: If you are finding the lower quartile of a sample, you should know that various textbooks and software programs use slightly different definitions. Roughly speaking, the lower quartile separates the lower quarter of the sorted data from the upper three quarters. Differences among the specific definitions arise most clearly when the sample size is not divisible by $4.$

For example, consider the sample of $n = 25$ observations shown below sorted from smallest to largest.

x
 [1]  66  82  86  91  95  95  96  96  96  97 101 104
[13] 104 106 106 107 108 108 109 111 111 112 112 120
[25] 125

The median is at 104 (the 13th observation in order). The 7th observation is 96. it has 'not more than 25% of the observations below': precisely, $6/25 = 24\%;$ it has 'not more than 75% of the observations above': precisely, $16/25 = 60\%$ (taking tied values into account). So many textbooks would say the lower quartile is 96.

According to the default method in R statistical software the lower quartile (25th percentile) is 96 as shown below:

quantile(x, .25)
25% 
 96

However, for compatibility with other software, R has nine different 'types' of definitions that you can request. They make different compromises when the sample size is not divisible by 4 and when there are ties in the vicinity of the lower quartile. Here are a few that give different answers:

quantile(x, .25, type=3)
25% 
 95 
quantile(x, .25, type=4)
 25% 
95.25 
quantile(x, .25, type=5)
 25% 
95.75   
quantile(x, .25, type=8)
 25% 
95.66667

Differences among the definitions of quantiles (including quartiles) may seem important in small datasets, but in practical applications, quantiles are most often used for very large datasets where the differences are usually not consequential.

Continuous distributions: If you are dealing with continuous probability distributions, then the 25th percentile is usually unique. Here are the lower quartiles of the distributions $\mathsf{Norm}(\mu=0, \sigma=1)$ and $\mathsf{Exp}(rate = 1):$

> qnorm(.25, 0, 1)
[1] -0.6744898
> qexp(.25, 1)
[1] 0.2876821

Discrete distributions: For discrete distributions such as binomial and Poisson, it is not usually possible to find an integer value that cuts off exactly the lower quarter of the probability in the distribution. Below are results for $\mathsf{Binom}(n=5, p=1/3)$ and $\mathsf{Pois}(\lambda = 5).$ The R functions pbinom and ppois designate CDFs of binomial and Poisson distributions respectively.

qbinom(.25, 5, 1/3)
[1] 1                # 25th percentile is 1
pbinom(1, 5, 1/3)
[1] 0.4609053        # P(X <= 1) = .4609 > .25
pbinom(0, 5, 1/3)
[1] 0.1316872        # P(X <= 0) = .1317 < .25

qpois(.25, 5)
[1] 3
ppois(3, 5)
[1] 0.2650259
ppois(2, 5)
[1] 0.124652

Best Answer

Related Solutions

[Math] How to find percentiles of data sets (Even vs odd)

[Math] the correct way to obtain the position of a quartile and its value

Related Question