Solved – Feature Scaling in Reinforcement Learning

machine learningneural networksreinforcement learning

I am working with RL algorithms like DQN and ActorCritic and I'm curious whether there is a way to correctly scale features which represent state or state/action pair while learning parameters of value function approximator and policy approximator.

In supervised learning we generally scale features on whole training set, in order to make objective function more convex and decrease learning time, storing mean and variance (i.e. zscore normalization) and applying them on test/cv set during validation.

In RL we dynamically obtain data via agent-environment interaction, so DQN's memory buffer updates at every timestep.

It is also necessary to normalize features if they have different scales in RL as well as in Supervised Learning.

Is there any standard process to scale features correctly for DQN and ActorCritic methods specifically, considering the dynamic nature of RL?

Best Answer

I've seen a number of different tactics used in different projects.

  1. Some people just ignore scaling altogether.
  2. Some people define the environment such that all of the data is implicitly scaled. For example, the information about the state is already scaled as a standard normal distribution.
  3. Relying on strategies like layer normalization to do scaling for you.

If it's possible, (2) seems like the preferred option: if you have the ability to control how the environment relays its data, then you can just solve the problem directly.

If that doesn't work, then using layer normalization seems sufficient, since it will dynamically update to reflect the mean and variance of the incoming data.

Ignoring scaling seems risky to me, since it's well-known that scaling can dramatically improve the learning process in supervised settings.

Related Question