Optimal value function: Cosmos — All that is, or was, or ever will be

Optimal value function

cosmos 15th July 2017 at 6:52pm

The Value function of an Optimal policy in Reinforcement learning.

It is the solution of the Bellman optimality equation, and is unique for finite MDPs.

Optimal value function

$V^* (s) = \max\limits_\pi V^\pi (s)$

Bellman optimality equation for $V^*$ (aka Bellman optimality equation, derivation; although I think the way to do it, is to treat first $a$ as indepdent of $\pi$ , and then realizing that maximizing over $a$ should give $V^*$ (and so $a$ should be $\pi(s_0)$ ):

$V^* (s) = R(s) + \gamma \max\limits_a \sum\limits_{s'} P_{s a} (s') V^* (s')$