Optimal value function

cosmos 15th July 2017 at 6:52pm
Value function

The Value function of an Optimal policy in Reinforcement learning.

It is the solution of the Bellman optimality equation, and is unique for finite MDPs.

Optimal value function

V(s)=maxπVπ(s)V^* (s) = \max\limits_\pi V^\pi (s)

Bellman optimality equation for VV^* (aka Bellman optimality equation, derivation; although I think the way to do it, is to treat first aa as indepdent of π\pi, and then realizing that maximizing over aa should give VV^* (and so aa should be π(s0)\pi(s_0)):

V(s)=R(s)+γmaxasPsa(s)V(s)V^* (s) = R(s) + \gamma \max\limits_a \sum\limits_{s'} P_{s a} (s') V^* (s')