MATLAB: Reinforcement Learning Random Action Generator

actionreinforcement learningReinforcement Learning Toolbox

Greeting. I'm jason a robotics student and I would really appreciate it if you could help me with the questions below.
Consider that we have an RL environment described as follows:
numObs = 10;
ObservationInfo = rlNumericSpec([numObs 1]);
ObservationInfo.Name = 'Robot Observations';
numAct = 15;
ActionInfo = rlNumericSpec([numAct 1]);
ActionInfo.UpperLimit = [5; 5; 2; 2; 1; 3; 6; 5; 6; 5; 1; 1; 1; 1; 1];
ActionInfo.LowerLimit = [1; 1; -2; -2; -2; -6; -12; -5; -6; -3;-1 ;-1 ;-1 ;-1 ;-1];
ActionInfo.Name = 'Robot Actions';
1_ In the function step(env, Action), the function takes the Action and nvironment as an input and implements the robot dynamics. In which part of the code should I describe the Action Parameter.
2_ Does the random action generator of a system in RL toolbox generate random action in the range of upper limit and lower limit of the ActionInfo? How does the process of random action generator work?
3_ Is there a way we can define our own random action generator for an RL agent?
Thanks in Advace
Regards
Jason

Best Answer

Hi Jason,
1) I am not really sure what you mean. There are two ways to create custom environments in MATLAB - one is using custom functions, and the other using a custom class template. If the links don't have the answer you are looking for please let me know.
2) Which algorithm are you referring to? I am assuming you are referring to a continuous method like DDPG since your questions is about respecting bounds. DDPG adds a random value to an action generated by the policy using a noise model. You are responsible for choosing the parameters of the noise model so that exploration happens within your desired range otherwise the actions will always be clipped based on your upper and lower limits. Make sure that you use a tanh and a scaling layer at the end of your actor to shape the action outputs of the policy in your desired range as well (noise will be added on top of that).
3) Again, for DDPG, you can find details of the implemented noise model here. There are many parameters you can change to customize this model, but it is not possible to use a custom one yet (we are working on it).