A coworker asked me how to calculate a 25-percentile and I gave him an answer but then I got unsure if I figured correctly. The problem is that our sample size will tend to be quite small so definition-wise there's no point calculating such. However, we need to set up a semi-scientific computation anyway. 🙂
My question is as follows. If we with a huge amount of good will assume that the following sample:
$$
s_1 := \{ 72, 88, 100 \}
$$
has a 25-percentile of 80 (mean between 72 and 88), or even 76 (quarter way through between 72 and 88), should the value of such a percentile be affected if we increase the maximum value as in the following sample?
$$
s_2 := \{ 72, 88, 200 \}
$$
Best Answer
With a sample size of three, it is meaningless to ask for "percentile".
By definition, the 25th percentile is "any value relative to which 25% of the observed values are lower (or greater)".
With a sample size of three, you can only get (approximately) 0, 33, 67, or 100% of the sample size.
In general, it is very difficult to get exactly 25% above/below the line (for starters, this will require the sample size to be divisible by 4). So in practice for the computation of percentiles there are various different ways and there is no "one right answer". (The methods generally agree for large sample sizes.) Irrespective of the methods, as long as your percentile line is drawn sufficiently far from the extreme values (which would be the case generally if the percentile $P$ and the number of samples $N$ satisfy $P / 100 * N$ and $(100-P)/100 * N$ are much bigger than 1), your percentile numbers will not depend on the extreme values.
When the sample size is too small:
How and how much this effect plays in depends on the method you use to compute the percentile line.