Detecting Outliers in Very Small Datasets – Techniques and Strategies

outliers

I have a data set that includes the different response times of a user that is visiting a web application.
For example, a visitor enters www.test.com in the browser and navigates through this domain watching child pages like www.test.com/news, www.test.com/overview, www.test.com/overview/current, etc. If a user watches a web site, it is called a user action.

Let's say a user has performed 5 user actions with the response times 200ms, 500ms, 350ms, 1200ms, 154ms. Now I want to find the outliers that express either fast page loads or slow page loads. Is that somehow possible?

Thanks

EDIT: I want to detect outliers because I want to determine the user experience depending on the response time. Let's say I have three ux-states, namely happy, ok and unhappy. All user actions are ok except the outliers. They are either unhappy if the response time is too high or happy if the response time is very low.

Best Answer

It depends on how you want to define outlier, since there isn't one particular definition of this concept. One of the more common ways to define this, though, is to consider the region $$ [ \pi_{.25} - 1.5 \times \mathrm{IQR}\,, \; \pi_{.75} + 1.5 \times \mathrm{IQR} ] $$ where $\pi_{.25}$ and $\pi_{.75}$ are the 25th and 75th percentiles, respectively, and $\mathrm{IQR}$ is the interquartile range, i.e. $\pi_{.75} - \pi_{.25}$. Of course, this region may be too wide or too narrow for a dataset of only 5 observations, but that is really just an inherent problem of trying to define an outlier from a small sample - having only 5 observations it's hard to get a feel for what the true distribution is that you are sampling from with such little information.

Related Question