Does Attention Help with standard auto-encoders

I understand the use of attention mechanisms in the encoder-decoder for sequence-to-sequence problem such as a language translator.

I am just trying to figure out whether it is possible to use attention mechanisms with standard auto-encoders for feature extraction where the goal is to compress the data into a latent vector?

Suppose we had a time series data with N dimensions and we wanted to use an auto-encoder with attention mechanisms (I am thinking of a self-attention because I think it is more appropriate in this case – I might be wrong) to better learn interdependence among the input sequence and thus we would get a better latent vector L.

Or it could be better to use Recurrent Neural Network or its variants in this case.

Does anyone have better thoughts or an intuition behind this?

Best Answer

I think attention can help. Please refer to this answer.

There are many ways for you to incorporate the attention with an autoencoder. The simplest way is just to borrow the idea from BERT but make the middle layers thinner.

