MATLAB: Confusion in Critic network architecture design in DDPG

criticddpgdeep learningDeep Learning Toolboxreinforcement learningReinforcement Learning Toolbox

Hello all,

I am trying to implement the following architecture for DDPG agent in MATLAB.

"In our design and implementation, we used a 2-layer fullyconnected feedforward neural network to serve as the actor network, which includes 400 and 300 neurons in the first and second layers respectively, and utilized the ReLU function for activation. In the final output layer, we used tanh(·) as the activation function to bound the actions.

Similarly, for the critic network, we also used a 2-layer fully-connected feedforward neural network with 400 and 300 neurons in the first and second layers respectively, and with ReLU for activation. Besides, we utilized the L2 weight decay to prevent overfitting."

This is taken from a paper.

Now I have implemented the actor in the following way— (don't bother about the hyperparameters)

actorNetwork = [
    featureInputLayer(numObservations,'Normalization','none','Name','observation')
    fullyConnectedLayer(400,'Name','fc1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(300,'Name','fc2')
    reluLayer('Name','relu2')
    fullyConnectedLayer(numActions,'Name','fc3')
    tanhLayer('Name','tanh1')
    scalingLayer('Name','ActorScaling1','Scale',[2.5;0.2618],'Bias',[-0.5;0])];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-4);
actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
    'Observation',{'observation'},'Action',{'ActorScaling1'},actorOptions);

However, I am confused on how to write the code for the Critic according to that paper description. I have done the following.

statePath = [
    featureInputLayer(numObservations,'Normalization','none','Name','observation')
    fullyConnectedLayer(400,'Name','fc1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(300,'Name','fc2')
    reluLayer('Name','relu2')
    additionLayer(2,'Name','add')
    fullyConnectedLayer(400,'Name','fc3')
    reluLayer('Name','relu3')
    fullyConnectedLayer(300,'Name','fc4')
    reluLayer('Name','relu4')
    fullyConnectedLayer(1,'Name','fc5')];
actionPath = [
    featureInputLayer(numActions,'Normalization','none','Name','action')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
    
%criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);

But I am confused in 'additionLayer' and the 'actionPath'. Is my implementation according to that paper description?

Can anyone suggest?

Thanks.

% create a network to be used as underlying critic approximator statePath = featureInputLayer(numObservations, 'Normalization', 'none', 'Name', 'state'); actionPath = featureInputLayer(numActions, 'Normalization', 'none', 'Name', 'action'); commonPath = [concatenationLayer(1,2,'Name','concat') fullyConnectedLayer(400, 'Name', 'CriticStateFC1') reluLayer('Name', 'CriticRelu1') fullyConnectedLayer(300, 'Name', 'CriticStateFC2') reluLayer('Name','CriticRelu2') fullyConnectedLayer(1,'Name','StateValue')]; criticNetwork = layerGraph(statePath); criticNetwork = addLayers(criticNetwork, actionPath); criticNetwork = addLayers(criticNetwork, commonPath); criticNetwork = connectLayers(criticNetwork,'state','concat/in1'); criticNetwork = connectLayers(criticNetwork,'action','concat/in2'); plot(criticNetwork)

Best Answer

Hello,

Does this paper use DDPG as well? Any images that show the network architecture? If it's another algorithm, the critic may be implemented with a state value network V(s).

DDPG uses Q-network for the critic which needs to take in state and actions (s,a). Reinforcement Learning Toolbox lets you implement this architecture by providing separate input "channels" or paths for the state and the action. That allows you to use different layers in these two paths to extract features more efficiently. See for example image below:

If you want, you can concatenate the observation and action inputs and use a common feature extraction path as follows:

Hope that helps

Best Answer

Related Solutions

MATLAB: How to extract a trained RL Agent’s network’s weights and biases

MATLAB: Implementation of Proximal Policy Optimisation

Related Question