Solved – How to detect outliers in skewed data set

data miningdata transformationdatasetoutliers

I am working on my school datamining project. Within preprocessing stage I need to remove outliers from my data set which is positively skewed (see description). I have an idea to remove all values which are larger than mean + 3 x standard deviation, but I am not sure this is a suitable technique for my case because the data set is not normally distributed. What technique should I use?

  var     n    mean      sd  median trimmed     mad  min     max   range skew kurtosis   se
1   1 41019 1668.99 1107.08 1453.68 1524.22 1026.05 10.9 5920.74 5909.84 1.18 1.33 5.47

Best Answer

Bottom line is that the decision to remove data from your dataset is a subject-matter decision, not a statistical decision. The statistics help you to identify outliers given what you believe about the dataset.

A very readable applied treatment of outliers is given in

A more advanced and detailed treatment is given in

Related Question