Hello all,
I am trying to implement the following architecture for DDPG agent in MATLAB.
"In our design and implementation, we used a 2-layer fullyconnected feedforward neural network to serve as the actor network, which includes 400 and 300 neurons in the first and second layers respectively, and utilized the ReLU function for activation. In the final output layer, we used tanh(ยท) as the activation function to bound the actions.
Similarly, for the critic network, we also used a 2-layer fully-connected feedforward neural network with 400 and 300 neurons in the first and second layers respectively, and with ReLU for activation. Besides, we utilized the L2 weight decay to prevent overfitting."
This is taken from a paper.
Now I have implemented the actor in the following way— (don't bother about the hyperparameters)
actorNetwork = [ featureInputLayer(numObservations,'Normalization','none','Name','observation') fullyConnectedLayer(400,'Name','fc1') reluLayer('Name','relu1') fullyConnectedLayer(300,'Name','fc2') reluLayer('Name','relu2') fullyConnectedLayer(numActions,'Name','fc3') tanhLayer('Name','tanh1') scalingLayer('Name','ActorScaling1','Scale',[2.5;0.2618],'Bias',[-0.5;0])];actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-4);actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,... 'Observation',{'observation'},'Action',{'ActorScaling1'},actorOptions);
However, I am confused on how to write the code for the Critic according to that paper description. I have done the following.
statePath = [ featureInputLayer(numObservations,'Normalization','none','Name','observation') fullyConnectedLayer(400,'Name','fc1') reluLayer('Name','relu1') fullyConnectedLayer(300,'Name','fc2') reluLayer('Name','relu2') additionLayer(2,'Name','add') fullyConnectedLayer(400,'Name','fc3') reluLayer('Name','relu3') fullyConnectedLayer(300,'Name','fc4') reluLayer('Name','relu4') fullyConnectedLayer(1,'Name','fc5')];actionPath = [ featureInputLayer(numActions,'Normalization','none','Name','action')];criticNetwork = layerGraph(statePath);criticNetwork = addLayers(criticNetwork,actionPath); %criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,... 'Observation',{'observation'},'Action',{'action'},criticOptions);
But I am confused in 'additionLayer' and the 'actionPath'. Is my implementation according to that paper description?
Can anyone suggest?
Thanks.
Best Answer