Hello,
Are pre-trained recurrent networks re-initialized when used in agents for reinforment learning? If so, how can it be avoided?
I am importing a LSTM network trained using supervised training as the actor for a PPO agent. When simulating without training the reward is fine, however If the agent is trained the reward falls as if no pre-trained network was used. I would expect the reward to be similar or higher after training so presumably the network is being re-initialized, is there a way around it?
Thanks
% Load actor
load(netDir);actorNetwork = net.Layers; actorOpts = rlRepresentationOptions('LearnRate',learnRate);actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'input'},actorOpts);% Create critic
criticNetwork = [sequenceInputLayer(numObs,"Name","input") lstmLayer(numObs) softplusLayer() fullyConnectedLayer(1)];criticOpts = rlRepresentationOptions('LearnRate',learnRate);critic = rlValueRepresentation(criticNetwork,obsInfo,'Observation',{'input'},criticOpts);% Create agent
agentOpts = rlPPOAgentOptions('ExperienceHorizon',expHorizon, 'MiniBatchSize',miniBatchSz, 'NumEpoch',nEpoch, 'ClipFactor', 0.1);agent = rlPPOAgent(actor,critic,agentOpts);% Train agent
trainOpts = rlTrainingOptions('MaxEpisodes',episodes, 'MaxStepsPerEpisode',episodeSteps, ... 'Verbose',false, 'Plots','training-progress', ... 'StopTrainingCriteria', 'AverageReward', ... 'StopTrainingValue',10);% Run training
trainingStats = train(agent,env,trainOpts);% Simulate
simOptions = rlSimulationOptions('MaxSteps',2000);experience = sim(env,agent,simOptions);
Best Answer