In Recurrent Neural Networks with connections from output units to the hidden units, we can use teacher forcing to make the training process faster by parallelization of learning in different time steps. In teacher forcing, we use the ground truth output in the current time step(available in the training data) to compute the system state in the next time steps. It is obviously faster than using the actual model output during training. But the question is that whether this is also more accurate?
Maybe if we are not worried about training time, it is better to use the actual model output instead of ground truth outputs, since, when the model is deployed, the model output is ultimately used to produce the system state in next time steps.
Machine Learning – Is Teacher Forcing More Accurate Than Using Actual Model Output or Just Faster?
machine learningneural networksrecurrent neural network
Best Answer
I'll begin by saying I'm no expert but was thinking about this same question. A little googling led me to this page:
https://machinelearningmastery.com/teacher-forcing-for-recurrent-neural-networks/
and, in turn, this paper:
https://arxiv.org/pdf/1610.09038.pdf
which as a paragraph addressing this to some degree in the introduction:
In addition, from the deeplearning.org book (http://www.deeplearningbook.org/contents/rnn.html) p378:
I would imagine (again, not an expert) that it is fairly problem dependent but that the main gain of teacher forcing is in the computational training and simplifying the loss landscape (i.e. since the whole sequence will contribute to the gradient of the parameters for a long sequences back propagation through time may make it difficult for the optimiser to converge even if it has a lot of computational time.)
Hope that helps!