Solved – Step-by-step explanation of Experience Replay in DQN

deep learningmachine learningneural networksq-learningreinforcement learning

I'm trying to understand exactly how to use experience replay in DQN but I'm not sure I understand how it's done.

Here's how I think it develops:

Observe state s
Take action a
Observe reward r and new state s'
Store transition in replay memory D(s,a,r,s')

Now my question is where exactly do we go from here? do we train the network and update the weights here, then on the next transition, instead of training the network on that transition, we sample a random one from the replay memory and update the weights like below?

Update weights
take action a
observe reward r and new state s'
Store new transition in D
sample random transition from D (which till now has only 2 entries)
Update weights according to sampled transition
repeat...

Is this how it's done?

Best Answer

What you are describing at the end is online learning, where we are continually updating the approximation of $Q$.

There is also the possibility of waiting until the end of the episode before sampling from $D(s,a,r,s')$. This can allow you to ground your reward $r$ at each step, in case the reward is delayed.

Related Question