Inference – How to Use the Transformer for Inference?

attentionnatural languageneural networks

I am trying to understand the transformer model from Attention is all you need, following the annotated transformer.

The architecture looks like this:

Everything is essentially clear, save for the output embedding on the bottom right. While training, I understand that one can use the actual target as input – all one needs is to

shift the target by one position to the right
use a mask to prevent using – say – the $n+k$-th word from the output to learn the $n$-th one

What is not clear to me is how to use the model at inference time. When doing inference, one of course does not have the output – what goes there?

Best Answer

A popular method for such sequence generation tasks is beam search. It keeps a number of K best sequences generated so far as the "output" sequences.

In the original paper different beam sizes was used for different tasks. If we use a beam size K=1, it becomes the greedy method in the blog you mentioned.

Best Answer

Related Solutions

Solved – What stops the network from learning the same weights in multi-head attention mechanism

Solved – Understand the output layer of transformer

Related Question