Solved – the intuition behind the sparsity parameter in sparse autoencoders

autoencodersdeep learningunsupervised learning

Sparse autoencoders is a unsupervised learning algorithm which tries to learn an identity function of the input. As mentioned in the notes of Andrew Ng's lecture on deep learning the average activation of neurons in the hidden layer over the training set are restricted lets say to 0.01 (rho) which is called the sparsity parameter. I am confused as to why would we be interested to restrict the activation of hidden neurons ?

Best Answer

An autoencoder attempts to reconstruct the input. During the process it could learn the identity function if the size of the hidden layers is greater than the number of inputs. However that is not desirable.

During learning, the autoencoder discovers the most common features in the input. For example if the input is a natural image, it discovers an edge because it is the most common feature in all natural images.

In the simplest case, the autoencoder is constructed with fewer hidden units than its input layer. As hidden units are added, it can enlist more features to represent the input. However, as the number of hidden units exceeds the number of input units, the features becomes more and more dependent. The autoencoder can discover those features when the hidden layers are densely activated.

Sparsity restricts the activation of the hidden units, which reduces the dependency between features. This allows us to increase the number of features, which is desirable.