Suboptimality gap in reinforcement learning

machine learningoptimization

I was reading some research papers on Reinforcement Learning Theory, and I constantly encountered a term called the suboptimality gap. As I searched the internet, I couldn't find any information about this term. So, I wonder whether anyone here knows what this means?

Best Answer

In Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs, suboptimality gap associate with action $a$ at state $x$ is defined to be

$$gap_\infty(x,a)=V^{\pi^*}(x)-Q^{\pi^*}(x,a),$$

It is the difference in the value of a particular action from a particular state as compared to the optimal move.

Similar term has been used in bandit problem as well.

Related Question