What Conclusions Can I Draw from Percentiles About the Set of Data

data analysispercentilestatistics

I have a specific survey question that I am analyzing for the topic Should High Schools Implement a Uniform Policy?. I am supposed to create biased survey questions that would try to convince the respondents to agree to the topic. The survey question is: the amount of time a student takes to pick, plan, and accessorize their outfit for school every morning. I have surveyed 37 people in total, and since this is one of the quantitative questions, I am determining the percentiles for it (no normal distribution applied here). The frequency table is
\begin{array}{c|c|c|c}
\text{Planning} \ (x)&\text{Number of students} \ (f)&\text{Cumulative frequency} \ (F)\\
\hline
2.5&10&10\\
8&10&20\\
13&7&27\\
18&6&33\\
23&1&34\\
28&2&36\\
45.5&1&37\\
\hline
\text{Total}&37
\end{array}

Where the Number of students (f) is the frequency of responses for planning outfits.

I am finding the percentiles for 16 – 20 minutes of planning. There are 27 values less than the 16-20 minutes range, plus this range includes 6 values, for a total of 33 values. 27 – 33 is the range of values when going from smallest time range to highest time range. Thus, I can say the percentile of these values is between (27/37)×100% and (33/37)×100%, which would give us 73% – 89%. I can't give an exact answer as there are multiple values in the range. I am required to draw further conclusions on this, but 16- 20 minutes of planning is like the middle range of values since majority take 15 minutes or less, and less people take 21 – 60 minutes (all of this can be seen in the table image). My question is, what more can I say about my percentiles and how can I draw efficient conclusions without sounding confusing and wordy? I originally had something like this:

Since we know that six people answered 16-20 minutes as their preparation time for their outfits for school in the morning, we can conclude that very few people out of the 37 respondents take over 10 minutes to prepare their outfits (57%). The students might wake up early have spare themselves some time to fit preparing their outfit, showering, getting ready, and eating breakfast without having 10 minutes or any less additional time, which would not be a huge deal. However, other students don`t have much time to get ready – much less do anything else – thus, any additional time might turn out to be a problem.

Best Answer

Let's add one more column to your table. This is the cumulative frequency, but expressed as a percentage of the total number of responses rather than as a raw number.

\begin{array}{c|c|c|c} \text{time}&\text{responses}&\mbox{cumulative frequency}&\text{cumulative percentage}\\ \hline 0-5 & 10& 10& 27.0\%\\ 6-10& 10& 20& 54.1\%\\ 11-15& 7& 27& 73.0\%\\ 16-20& 6& 33& 89.2\%\\ 21-25& 1& 34& 91.9\%\\ 26-30& 2& 36& 97.3\%\\ 31-60& 1& 37& 100\%\\ \hline \text{Total}&37 \end{array}

This tells you that $54.1\%$ of the students spend $10$ minutes or less. A student who spends $11$ minutes is therefore in the $54$th percentile, or the $54$th percentile is $11$ minutes. (At least, that's the usual convention for percentiles. It's actually a bit contrary to the way other statistics such as quartiles or quintiles are named, where the lowest-valued one is always first; instead, a student who spent $0$ minutes would be in the $0$th percentile.)

In the same way, since $73\%$ spend $15$ minutes or less, the $73$rd percentile is $16.$ The $89$th percentile is $21.$ The student who spent more than $30$ minutes is in the $97$th percentile.

Of course, if $73\%$ take less than $16$ minutes, then $27\%$ take $16$ minutes or longer.

For times that are not at the boundary of a bin, one technique is to assume uniform distribution of responses within the bin and compute percentiles from that. For example, $16.2\%$ of responses are in the range $16$ to $20,$ inclusive. So we might imagine that $3.21\%$ give each of the five responses in that range. This is somewhat unrealistic, since $3.21\%$ of responses would be $1.2$ students, but you have to do something to account for the fact that the number of students in that bin doesn't divide evenly over the number of minutes. Then we could say that $(73 + 2\times3.2)\% = 79.4\%$ of students take less than $18$ minutes. ($73\%$ for $15$ minutes or less, $2\times3.21\%$ for $16$ or $17$ minutes). That is, $18$ minutes is in the $79$th percentile.

That's getting rather technical for little purpose, however. You can probably make your best possible arguments by concentrating on the boundaries between the bins, such as $16$ minutes or $21$ minutes.

Related Question