Time Series Classification – Using Transformer Encoder for Time Series Classification

Lets say I have a collection of tensors, each tensor representing a time series with 64 points and 4 features. The dimension of each tensor would be [64,4]. I am trying to classify these series. For that I am first passing these tensors into a Transformer Encoder (having 2 attention heads and 2 encoder layers) that outputs a tensor of the same dimension. This output tensor is being flattened and passes onto a dense layer for classification. Is there some advantage of passing the time series through the encoder and classifying the encoded output over directly passing the original tensors to the dense layer.

I tried this experimentally and saw no significant increase in accuracy when using the transformer encoder. However, the data I had was quite simple and not enough to make any conclusions. Also an expert I know insists that the model with the input processed by a transformer should work better.

One thing I observed was a steeper decrease in loss when using the encoded tensors for classification.

I also referred to this resource on this matter: https://www.linkedin.com/pulse/time-series-classification-model-based-transformer-gokmen/

Best Answer

This is an interesting question.

I guess the transformer might help in pretraining with a different (unsupervised task, like predicting masked timesteps). Then (if this pretraining was done on a huge amount of unlabelled data), I could imagine that the classification head on top of these pretrained transformers might perform better in your classification task than training a fully connected neural network for classification directly.

The benefit would be that the transformer learn about features of your data, although you do not get additional labelled data for the classification task.

Only experiments will tell you for sure.

Best Answer

Related Solutions

Transformers – How to Properly Mask MultiHeadAttention for Sliding Window Time Series Data