Mathematical Statistics – How to Calculate the 50th Percentile in a Small Data Set

mathematical-statisticsquantiles

I'm trying to learn the concept of percentile.

Question: Given these numbers: {1, 2, 3, 900}, I'm trying to calculate the 50th percentile.

My answer: 3. But different websites are saying: 2.5

My reasoning: Two values (1 and 2) are below the value number 3. There are overall 4 values in the data set, so 50% (2 our of 4) of the values are smaller than 3.
I'm using wikipedia's definition:

A percentile is a measure indicating the value below which a given percentage of observations in a group of observations falls

What am I missing?

Best Answer

The Wikipedia wording isn't wildly wrong but it doesn't give a precise rule, which is what you need.

Consider this variant on your argument. Two numbers of 1, 2, 3, 900 are above 2. There are 4 values in total, so 50% are larger than 2. So choose 2 as the answer.

What is reported as the middlemost (a word Galton used) value should not depend on whether you start at the lowest value and work up or start at the highest value and work down. There is a clear answer either way if the number of values is odd but we need a rule for the number of values being even, as is 4.

With an even number of values, using the midpoint between the two middle values (the "comedians", naturally) as the median or 50th percentile is explained as a convention to mathematical audiences and as a rule to everybody else.

NB: Which calculation rule to use for arbitrary percentiles is (surprisingly perhaps) wide open territory with on one count nine different ways to do it. That is well covered in other threads. Here I focus on the small fallacy exposed in the question.

Related Question