I am currently trying to control the simlink homebrew environment using PPOAgent.
However, the following error occurs, and the problem continues to be unsuccessful.
How should we improve the situation?
Error: rl.representation.rlStochasticActorRepresentation (line 32)Number of outputs for a continuous stochastic actor representation must be two times the number of actions.Error: rlStochasticActorRepresentation (line 139)Rep = rl.representation.rlStochasticActorRepresentation(...
my code
clear allmotion_time_constant = 0.01;mdl = 'fivelinkrl';open_system(mdl)Ts = 0.05;Tf = 20;mdl = 'fivelinkrl';open_system(mdl)agentblk = [mdl '/RL Agent'];numObs = 15;obsInfo = rlNumericSpec([numObs 1]);obsInfo.Name = 'observations';numAct = 5;actInfo = rlNumericSpec([numAct 1],'LowerLimit',-10,'UpperLimit',10);actInfo.Name = 'Action';% define environment
env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);%createPPOAgent
criticLayerSizes = [400 300];actorLayerSizes = [400 300];createNetworkWeights;criticNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations') fullyConnectedLayer(criticLayerSizes(1),'Name','CriticFC1', ... 'Weights',weights.criticFC1, ... 'Bias',bias.criticFC1) reluLayer('Name','CriticRelu1') fullyConnectedLayer(criticLayerSizes(2),'Name','CriticFC2', ... 'Weights',weights.criticFC2, ... 'Bias',bias.criticFC2) reluLayer('Name','CriticRelu2') fullyConnectedLayer(1,'Name','CriticOutput',... 'Weights',weights.criticOut,... 'Bias',bias.criticOut)]; criticOpts = rlRepresentationOptions('LearnRate',1e-3);critic = rlValueRepresentation(criticNetwork,env.getObservationInfo, ... 'Observation',{'observations'},criticOpts); actorNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations') fullyConnectedLayer(actorLayerSizes(1),'Name','ActorFC1',... 'Weights',weights.actorFC1,... 'Bias',bias.actorFC1) reluLayer('Name','ActorRelu1') fullyConnectedLayer(actorLayerSizes(2),'Name','ActorFC2',... 'Weights',weights.actorFC2,... 'Bias',bias.actorFC2) reluLayer('Name','ActorRelu2') fullyConnectedLayer(numAct,'Name','Action',... 'Weights',weights.actorOut,... 'Bias',bias.actorOut) softmaxLayer('Name','actionProbability') ]; actorOptions = rlRepresentationOptions('LearnRate',1e-3);%%%% ↓error %%%%%%%%%%%%%%%%%
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,... 'Observation',{'observations'}, actorOptions);%%%% ↑error %%%%%%%%%%%%%%%%%%
opt = rlPPOAgentOptions('ExperienceHorizon',512,... 'ClipFactor',0.2,... 'EntropyLossWeight',0.02,... 'MiniBatchSize',64,... 'NumEpoch',3,... 'AdvantageEstimateMethod','gae',... 'GAEFactor',0.95,... 'SampleTime',0.05,... 'DiscountFactor',0.9995);agent = rlPPOAgent(actor,critic,opt); %TrainAgent
maxEpisodes = 4000;maxSteps = floor(Tf/Ts);trainOpts = rlTrainingOptions(... 'MaxEpisodes',maxEpisodes,... 'MaxStepsPerEpisode',maxSteps,... 'ScoreAveragingWindowLength',250,... 'Verbose',false,... 'Plots','training-progress',... 'StopTrainingCriteria','EpisodeCount',... 'StopTrainingValue',maxEpisodes,... 'SaveAgentCriteria','EpisodeCount',... 'SaveAgentValue',maxEpisodes);trainingStats = train(agent,env,trainOpts);save('agent.mat', 'agent')Result in simulationsimOptions = rlSimulationOptions('MaxSteps',maxSteps);experience = sim(env,agent,simOptions);
Best Answer