Solved – In it necessary to split train, test, validation dataset for unsupervised machine learning algorithm (eg. autoencoder)

autoencodersunsupervised learning

Generally in supervised machine learning algorithms, the model performance is measured splitting train, test, validation set.

But in case of unsupervised method , like autoencoder, is it necessary to have such split ? Can't the training dataset itself be a test set to evaluate its performance ?

Best Answer

In case of auto-encoder, the task is to encode the given dataset. So the whole concept of new/unseen data samples is not given when dealing with an auto-encoder setup.

On the other hand, for unsupervised tasks such as variational auto-encoders, it is definitely of interest to have a train/test/validation set split in order to figure out, how well the learned representation of our dataset generalises to newly/unseen data samples. But it also helps you to figure out if you are already overfitting your training set and therefore it might be helpful in order to determine when to stop training e.g. early stopping criterion.