I'm on a project which trys to tune a motor's PID controller parameters using reinforcement learning. Here is my idea:
I can modify Matlab's rlwatertank example, replacing the watertank with a motor + PID controllers. The output of the agent is the Kp, Ki and Kd gains of PID. After the agent outputs the Kp, Ki and Kd, I can run simulation to see the errors of my motor's step response, like overshooting percentage, settling time or steady-state error, etc. Then I use these errors to calculate the reward and sending it to the agent.
The problem is I don't know how to give the agent reward after each simulation is done, instead of giving it reward while simulation is still running. Anyone has idea?
Thanks a lot in advance.
Best Answer