Solved – Accuracy of Anomaly detection for unlabelled data

accuracyanomaly detectionk-meansoutliers

Is there a method to calculate how well your model classified outliers in an unlabelled dataset?

I am currently using K-means on a data set by plotting the time against the Speed of a car. There are about 200 cars in a single data set.
Hence, 200 plots in the graph.

Any paper reference or suggestions would help.

Thanks.

Best Answer

Yes and no. In clustering there are a number of quantifications for how 'successful' a certain unsupervised method has been, e.g. the Silhouette coefficient. To my knowledge, very few papers have been published on this subject in the outlier detection field.

Marques' paper On the internal evaluation of unsupervised outlier detection, is one of them. I am not familiar with a public implementation of that paper's method.

However, one should always be aware of what these methods actually measure. They only measure how well your method matches with the evaluation method's definition of what an outlier is. That is why I said "no" to the question whether there is a nice way of evaluating your problem. As there is no ground truth, you can only describe the data, not evaluate it, at least not in the true sense.

Related Question