Solved – Computing average value ignoring outliers

outliersstandard deviation

This is more of a general statistics question, though if it matters I'm writing PHP code.

Let's say I'm trying to compute the average value of a toy that is commonly bought and sold on the secondary market, and I have a set of price values culled both from auctions and from user-entered "price paid" data. The data points that represent auctions are pretty reliable, but I also get the occasional "garage sale" type of data point, where someone may have paid a buck to buy something from Aunt Polly at a garage sale. The problem is that the $1 type of data points aren't really valuable to me, as they don't really indicate value–Aunt Polly didn't know any better, and didn't care. Similarly, I may occasionally get a data point coming from a jokester entering $9000 for a toy that is really only worth $9.

So, when computing value, what's the best way to factor these types of anomalies out of otherwise useful data?

I've read about outliers, and something about generally ignoring anything that is more than 2.5 standard deviations outside the rest of the data, but I'm looking for the full recipe, here.

Thanks so much!

Best Answer

In boxplots, values that are more than 1.5 times the IQR (interquartile range, difference between quartile 1 and 3) away from (as in: in the direction away from the median) the quartiles are typically considered outliers.

I cannot say whether this is an appropriate measure for your data, though...

Related Question