Episode vs Epoch – Differences in Deep Q Learning Terminology

neural networksq-learningreinforcement learningterminology

I am trying to understand the famous paper "Playing Atari with Deep Reinforcement Learning" (pdf). I am unclear about the difference between an epoch and episode. In algorithm $1$, the outer loop is over episodes, while in figure $2$ the x-axis is labeled epoch. In the context of reinforcement learning, I'm not clear what an epoch means. Is an epoch an outer loop around the episode loop?

enter image description here

enter image description here

Best Answer

  • one episode = one a sequence of states, actions and rewards, which ends with terminal state. For example, playing an entire game can be considered as one episode, the terminal state being reached when one player loses/wins/draws. Sometime, one may prefer to define one episode as several games (example: "each episode is a few dozen games, because the games go up to score of 21 for either player").
  • one epoch = one forward pass and one backward pass of all the training examples, in the neural network terminology.

In the paper you mention, they seem to be more flexible regarding the meaning of epoch, as they just define one epoch as being a certain amount of weight updates. You can therefore view one epoch as being an outer loop around the episode loop, as you mentioned in the question.

Related Question