Proving a function satisfying Bellman’s equation is optimal

dynamic programmingmachine learningoptimization

The Question


I'd like to prove that a function $V$ (like in reinforcement learning) is optimal iff it satisfies the Bellman equation. A lot of places online reference this fact, but none prove it.

Formal definitions


In reinforcement learning, a value function $V$ is used to derive a policy: $$\pi_{V}\left(a\mid s\right)=\begin{cases}
1 & a=\underset{a'}{\mathrm{argmax}}Q^{V}\left(s,a\right)\\
0 & \text{otherwise}
\end{cases}$$
where $$Q^{V}\left(s,a\right)=r\left(s,a\right)+\gamma\underset{s'\sim p\left(s'\mid s,a\right)}{\mathbb{E}}\left[V\left(s'\right)\right]$$ (here $r$ is the reward function and $p$ is the transition probability from a state to another based on the action)

A policy $\pi^\star$ is called optimal if $$\pi^{\star}=\underset{\pi}{\mathrm{argmax}}\mathbb{E}_{\pi}\left[\sum_{t=0}^{\infty}\gamma^{t}\cdot r\left(s_{t},a_{t}\right)\mid s_{0}=s\right]$$ where $s$ is the initial state, and the expectation is over the transition probability under the assumption of following the policy $p$.

We also say that $V$ is optimal if $\pi_{V}$ is optimal.

The Theorem I want to prove is that a function $V^\star$ is optimal if and only if it satisfies the Bellman equation:

$$V^{\star}\left(s\right)=\max_{a}r\left(s,a\right)+\gamma\mathbb{E}\left[V^{\star}\left(s'\right)\right]$$

Best Answer

A friend of mine showed me the following "post", and the proof is actually quite short.

They prove that $$\lim_{k\to\infty}V_{k}=V^{\star}$$ where $V_k$ is defined in the post. Now if the original $V$ satisfies the bellman equation already, it means that $$\lim_{k\to\infty}V_{k}=V$$ and therefore $V^{\star}=V$. The other direction is of course easy.

Related Question