Hello,
I am trying to tune my TD3 agent to solve my custom environment. The environment has two actions in the following range: the first one in [0 10] and the second one in [0 2*PI) (rlNumericSpace).
I am following this example architecture—
https://in.mathworks.com/help/reinforcement-learning/ug/train-td3-agent-for-pmsm-control.html
Now I have the following questions.
- Since tanh is [-1 1], should I use the scaling layer at the actor network's end? maybe with the following values
scalingLayer('Name','ActorScaling1','Scale',[5;pi],'Bias',[5;pi])];
2. How to setup Exploration noise and Target policy noise? I mean, what should be their variance values? Well, not precisely tuned, but a competent range given I have more than one action and the provided action range is not in [-1 1] ?
3. How do I clip those values to fit inside the action bound? I dont see any such option in rlTD3AgentOptions
I see all the TD3 examples (and most RL examples in general) action's range is b/n [-1 1]. I am confused about modifying the parameters when the action space is not within [-1 1], like in my case.
Thanks.
Best Answer