[Math] the correct way to obtain the position of a quartile and its value

probabilitystatistics

I'm studying statistics, specifically position measurements.

The problem is that I have seen many ways to calculate the quartile, quintile, percentile, decile, etc.

The main formulas that I have seen to calculate the POSITION of a quartile are:

$\frac{(n+1)i}{4}$ and $\frac{ni}{4}$, where $n =$ number of data, $i =$ number of quartile$(1, 2, 3)$

Some websites claim that one formula is to calculate odd data and the other in pairs. However, in many exercises, they use the formula (n + 1) q / 4 for a number of even data and odd numbers of data, without distinction.

What has given me success in the exercises (so far), is to obtain the POSITION of the quartile with the formula $\frac{(n+1)i}{4}$, without distinguishing between odd and even data, and to obtain the VALUE of the quartile I have two options:

  • If the position of the quartile is a decimal, the VALUE will be the average between the two closest integers.

  • If the position of the quartile is an INTEGER, the VALUE will be the data that is in that position.

And I have the same doubt, with the formulas for quintile, decile and percentile.

But my recently exposed methodology has worked well for me, however I come to ask, Is the way I am getting the POSITION and VALUE of the position measurements correct or not correct at all? Thanks in advance.

Best Answer

Data: If you are finding the lower quartile of a sample, you should know that various textbooks and software programs use slightly different definitions. Roughly speaking, the lower quartile separates the lower quarter of the sorted data from the upper three quarters. Differences among the specific definitions arise most clearly when the sample size is not divisible by $4.$

For example, consider the sample of $n = 25$ observations shown below sorted from smallest to largest.

x
 [1]  66  82  86  91  95  95  96  96  96  97 101 104
[13] 104 106 106 107 108 108 109 111 111 112 112 120
[25] 125

The median is at 104 (the 13th observation in order). The 7th observation is 96. it has 'not more than 25% of the observations below': precisely, $6/25 = 24\%;$ it has 'not more than 75% of the observations above': precisely, $16/25 = 60\%$ (taking tied values into account). So many textbooks would say the lower quartile is 96.

According to the default method in R statistical software the lower quartile (25th percentile) is 96 as shown below:

quantile(x, .25)
25% 
 96 

However, for compatibility with other software, R has nine different 'types' of definitions that you can request. They make different compromises when the sample size is not divisible by 4 and when there are ties in the vicinity of the lower quartile. Here are a few that give different answers:

quantile(x, .25, type=3)
25% 
 95 
quantile(x, .25, type=4)
 25% 
95.25 
quantile(x, .25, type=5)
 25% 
95.75   
quantile(x, .25, type=8)
 25% 
95.66667 

Differences among the definitions of quantiles (including quartiles) may seem important in small datasets, but in practical applications, quantiles are most often used for very large datasets where the differences are usually not consequential.


Continuous distributions: If you are dealing with continuous probability distributions, then the 25th percentile is usually unique. Here are the lower quartiles of the distributions $\mathsf{Norm}(\mu=0, \sigma=1)$ and $\mathsf{Exp}(rate = 1):$

> qnorm(.25, 0, 1)
[1] -0.6744898
> qexp(.25, 1)
[1] 0.2876821

Discrete distributions: For discrete distributions such as binomial and Poisson, it is not usually possible to find an integer value that cuts off exactly the lower quarter of the probability in the distribution. Below are results for $\mathsf{Binom}(n=5, p=1/3)$ and $\mathsf{Pois}(\lambda = 5).$ The R functions pbinom and ppois designate CDFs of binomial and Poisson distributions respectively.

qbinom(.25, 5, 1/3)
[1] 1                # 25th percentile is 1
pbinom(1, 5, 1/3)
[1] 0.4609053        # P(X <= 1) = .4609 > .25
pbinom(0, 5, 1/3)
[1] 0.1316872        # P(X <= 0) = .1317 < .25

qpois(.25, 5)
[1] 3
ppois(3, 5)
[1] 0.2650259
ppois(2, 5)
[1] 0.124652