Solved – What does pre-training mean in deep autoencoder

autoencodersdeep learningmachine learningneural networks

I am confused by the term "pre-training". What does it mean in deep autoencoder? And how does it help improving the performance of autoencoder? (I know this term comes from Hinton 2006's paper: "Reducing the dimensionality of Data with Neural Networks".)

Best Answer

An auto encoder is a stack of $K$ models of the form $$ y^k = \sigma(W^ky^{k-1} + b^k) $$ where $y^{k-1}$ is the input to the net and $y^k$ is its output. It is then trained to minimize some reconstruction loss, e.g. $$ \mathcal{L}(W^1, b^1, \dots, W^k, b^k) = ||y^K - y^0||_2^2. $$ Pretraining now means to optimise some similar objective layer wise first: you first minimize some loss $\mathcal{L}^k$, starting out at $k=1$ to $k=K$.

A popular example is to minimize the layer wise reconstruction: $$ \mathcal{L}(k) = ||{W^k}^T\sigma(W^ky^{k-1} + b^k||_2^2, $$ wrt to $W^k, b^k$. This means that each auto encoder learns first to auto encode the input to itself.

Note that this strategy is obsolete nowadays due to non-saturating transfer functions, better understanding of the optimisation problem and GPUs.