I am using DDPG network to run a control algorithm which has inputs (actions of RL agent, 23 in total) varying between 0 and 1. I an defining this using rlNumericSpec
actInfo = rlNumericSpec([numAct 1],'LowerLimit',0,'UpperLimit', 1);
Then I am using tanhLayer in the actor network (similar to bipedal robot example) and then using
actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-4, 'GradientThreshold',1,'L2RegularizationFactor',1e-5);actor = rlRepresentation(actorNetwork,env.getObservationInfo,env.getActionInfo, 'Observation',{'observation'}, 'Action',{'ActorTanh1'},actorOptions);
But i feel that the model is only taking the extreme options ie mostly 0 and 1.
Will it be better to use a sigmoid function to get better action estimates?
Best Answer