MATLAB: How to extract a trained RL Agent’s network’s weights and biases

How can I extract a trained RL Agent's network's weights and biases?

My network is:

statePath = [
    imageInputLayer([numObservations 1 1], 'Normalization', 'none', 'Name', 'state')
    fullyConnectedLayer(NumNeuron, 'Name', 'CriticStateFC1')
    reluLayer('Name', 'CriticRelu1')
    fullyConnectedLayer(NumNeuron, 'Name', 'CriticStateFC2')];
actionPath = [
    imageInputLayer([1 1 1], 'Normalization', 'none', 'Name', 'action')
    fullyConnectedLayer(NumNeuron, 'Name', 'CriticActionFC1')
    reluLayer('Name', 'ActorRelu1')
    fullyConnectedLayer(NumNeuron, 'Name', 'CriticActionFC2')];
commonPath = [
    additionLayer(2,'Name', 'add')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(1, 'Name', 'output')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);    
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC2','add/in2');
% set some options for the critic
criticOpts = rlRepresentationOptions('LearnRate',learing_rate,...
                                      'GradientThreshold',1);
% create the critic based on the network approximator
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation',{'state'},'Action',{'action'},criticOpts);
agent = rlDQNAgent(critic,agentOpts)
trainingStats = train(agent,env,trainOpts);

After training, I'd like to get the network's trained weights and biases.

Best Answer

You can get the parameters from the trained's critic representation for DQN agent. In MATLAB R2020a, see getLearnableParameters and getCritic functions (function name changes a bit since R2019b). You can follow similar steps to get the actor's parameters from actor-based agent like DDPG or PPO.

critic = getCritic(agent);
criticParams = getLearnableParameters(critic);

Best Answer

Related Solutions

MATLAB: Confusion in Critic network architecture design in DDPG

MATLAB: Implementation of Proximal Policy Optimisation

Related Question