Machine Learning – How to Perform Transfer Learning with Autoencoders

machine learningneural networkstransfer learning

I have been thinking to train a variational autoencoder on a larger texture dataset, so that I can fine-tune it on my specific texture dataset and hope that the reconstruction would be better.

I did not really find anything on how to do this fine-tuning with autoencoders. Do I add a layer before and after the latent vector or do I have to do something else?

Best Answer

I do not know if I am getting your question correct. But from what I understand you can train all layers on a large texture data-set then you can freeze the weights of all layers just before bottleneck one(the one you extract new features) and train remaining layers on the new data-set.

If your is aim is just reconstruction not dimension reduction: Then you can do the same thing but instead you freeze all the layers except last layer, then you end up optimizing weights of only last layer

Either way code will be similar to below example, only specific nets architecture will change:

https://keras.io/guides/transfer_learning/

Hope this helps, good luck

Edit:

Then you should follow the first approach. Reconstruction error might be misleading because autoencoders also risk overfitting in the sense that your extracted features might be useless when you feed them to a new model. The best way to find correct encoding is trial and error: use your extracted features for your specific goal and see which one yields better performance. You can think of transfer learning as acquiring fundamental knowledge about a particular topic; then, you specialize in a sub-topic. Consider the example of general news and financial news, they are both news, but the context is entirely different. So I can train my autoencoders on general news so that it sees lots of various examples and constructs meaningful word vectors. Then I freeze layers till bottleneck one, so my model preserves available information acquired from sizeable textual data. After that, I can train the remaining layers on my financial news data-set. By doing that, I force my model to learn specific finance jargon. Instead, if I just trained my model on financial news, reconstruction error probably will be lower. Still, my extracted features might not be useful because I might not have enough variance in my financial news data-set. That is my autoencoder, just overfitting data in hand.

Related Question