MATLAB: What is the best activation function to get action between 0 and 1 in DDPG network

actor critic networkddpg agentDeep Learning Toolboxreinforcement learningReinforcement Learning Toolbox

I am using DDPG network to run a control algorithm which has inputs (actions of RL agent, 23 in total) varying between 0 and 1. I an defining this using rlNumericSpec

actInfo = rlNumericSpec([numAct 1],'LowerLimit',0,'UpperLimit', 1);

Then I am using tanhLayer in the actor network (similar to bipedal robot example) and then using

actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-4, 'GradientThreshold',1,'L2RegularizationFactor',1e-5);
actor = rlRepresentation(actorNetwork,env.getObservationInfo,env.getActionInfo, 'Observation',{'observation'},  'Action',{'ActorTanh1'},actorOptions);

But i feel that the model is only taking the extreme options ie mostly 0 and 1.

Will it be better to use a sigmoid function to get better action estimates?

Best Answer

Hello,

With DDPG, a common thing to do in the final 3 layers of the actor is to use a fully connected layer, a tanh layer and a scaling layer. Tanh will get the ouput of that layer between -1 and 1 and then you can use the scaling layer to scale/shift values as needed based on the specifications of the actuator in your problem.

It seems the problem here is due to noise that is being added during training with DDPG to allow sufficient exploration (for example see step 1 here). The default noise options have a pretty high variance, so when this is added to the output of the tanh layer, it ends up outside the [0, 1] range and is being clipped. This is why you are only getting the two extremes.

Try adjusting the DDPG noise options, and particularly the variance (make it smaller, e.g. <=0.1). Also, see here for some best practices when choosing noise parameters.

Hope that helps

Related Solutions

MATLAB: Incorrect tanhLayer output in RL agent

I’ve tried this. I still don’t see the values going beyond [–1, 1]. However, I might be able to answer your question. If you have a look at the helper functions createTD3Agent.m and createDDPGAgent.m, you will notice the ‘agentoptions’ object. The parameters called ‘ExplorationModel’ or ‘NoiseModel’ specify details about the kind of noise added to the predicted action. This can either be an ‘OrnsteinUhlenbeckActionNoise’ object or a ‘GaussianActionNoise’ object each with their own set of parameters. Have a more detailed look at the Noise Options here: rlDDPGAgentoptions and rlTD3AgentOptions. This noise is added to encourage the agent to explore the environment.

The output action from the tanhLayer in the ‘actorNetwork’ will still be in the range of [–1, 1]. Once the noise is added, the new action values will be saturated to the limits specified in the ‘ActorInfo’. These limits will be [-Inf, Inf] by default and won’t saturate your action values when not mentioned.

MATLAB: How to extract a trained RL Agent’s network’s weights and biases

You can get the parameters from the trained's critic representation for DQN agent. In MATLAB R2020a, see getLearnableParameters and getCritic functions (function name changes a bit since R2019b). You can follow similar steps to get the actor's parameters from actor-based agent like DDPG or PPO.

critic = getCritic(agent);
criticParams = getLearnableParameters(critic);

Best Answer

Related Solutions

MATLAB: Incorrect tanhLayer output in RL agent

MATLAB: How to extract a trained RL Agent’s network’s weights and biases

Related Question