Reinforcement Learning – Handling Large Gradients in DQN Models

gradient descentneural networksregressionreinforcement learning

I am reading these notes on slide 34 and came across strategies to prevent gradients from becoming too big in Deep Q Learning (DQN). Since, we don't usually use deep architectures in DQN, I don't think it's an exploding gradient problem. My understanding is that it has something to do with the linear regression squared error loss function, since DQN is a regression network. Could someone please explain it to me?

Best Answer

It's hard to say for sure, since the slide is so sparse, but I think the implication is that squaring a number with absolute value larger than 1 can become very large very quickly, and likewise the gradients can grow very fast. The author's suggestion of Huber loss makes sense as a remedy to this problem, because you can choose where the loss becomes MAE loss, which does not grow as quickly as a square error loss.

Likewise, this suggests that you're correct that the problem isn't exploding gradients. Exploding gradients refers to multiple layers having gradients larger than 1, so the magnitude of the update compounds as the chain rule is applied.

Even though the gradient is not exploding, it's still true that gradients can cause instability in training -- large steps in nearly-orthogonal directions can prevent the model from improving, or simply slow improvement.

Related Question