Solved – How to perform Validation on Unsupervised learning

anomaly detectioncross-validationunsupervised learningvalidation

Since I consider Unsupervised learning, I don't have any ground truth to compare with, during the validation phase. So, is there any standard method to deal with it?


Additional informations:

  • in my particular case, "validation" is a cross-validation indeed.
  • I'm developing a custom binary anomaly detection model which labels dataset records in 2 classes: "normal" and "abnormal"

Best Answer

I'm not sure if it will be considered and answer as in fact is a pointer to a possible answer, but at the same time, I don't have enough reputation to add it as a comment. So it will go here, maybe someone with more rights can move it as comment.

I'm struggling with this theme too and today I found this PhD thesis

"CROSS-VALIDATION FOR UNSUPERVISED LEARNING" by Patrick O. Perry September 2009 - Stanford University in the abstract the author states

This thesis discusses some extensions of cross-validation to unsupervised learning, specifically focusing on the problem of choosing how many principal components to keep. We introduce the latent factor model, define an objective criterion, and show how CV can be used to estimate the intrinsic dimensionality of a data set. Through both simulation and theory, we demonstrate that cross-validation is a valuable tool for unsupervised learning.

http://ptrckprry.com/reports/

Related Question