Neural Networks – Understanding Components of Temporal Fusion Transformer

attentionlstmneural networkstemporal-fusion-transformertransformers

I'm currently reading the paper Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting: https://arxiv.org/abs/1912.09363v3

However, I had to stop at page 7/8, which contains the different pieces of the TFT, because I lack the knowledge of most of the pieces mentioned there. Thus, my following questions are:

1.) What are the different components I have to understand, in order to understand how the TFT works? So far I got: LSTM (and Gates), Transformers (with Encoders/Decoders), and Multi-Headed-Attention-Blocks, are there more parts I have to understand the TFT, e.g. what about the GRN or Dense?

2.) Do you know of any literature that explains these components in terms of time-series AND are not that math-heavy?
I normally learn the maths from an example much better.
So far, I read through a few blogs, looked some tutorials or vids, even a video about the paper itself, but they do not focus so strong on the individual components. Furthermore, nearly every tutorial explains parts of the TFT as a problem of speech-translation for example, and not as a time-series problem.

Although I did some transfer to time-series. I have no further literature to crossvalidate my thoughts to.

So far I read/viewed:

All the best

Best Answer

First of all, you should understand why Temporal Fusion Transformer(TFT) is such an awesome model.

The biggest advantages of TFT are versatility and interpretability. In other words, the model works with multiple time series, with all sorts of inputs (even categorical variables!). Also, it is not a black-box model: With attention weights you can can find which features are important, as well as the dominant seasonal patterns in your dataset! How cool is that?

I recommend reading this article. It explains in depth the different components of TFT and how they work together. A slightly difficult concept to grasp in case you are not familiar with is the Attention mechanism. Besides, TFT is a Transformer-based model so it uses attention (with some extra perks). If you hear the term attention for the first time (in the context of deep learning), take a look here - this is the best source on the internet that explains attention, plus it uses illustrations!

(Disclaimer: I am the author of the first article)

Related Question