Value iteration: Cosmos — All that is, or was, or ever will be

Value iteration

cosmos 15th July 2017 at 7:11pm

can be seen as applying iterative solution to the Bellman optimality equation. Also it is like Policy iteration, where we only take one step of the policy evaluation step.

Iterate:

V_{i+1}(s) := \max_a \left\{ \sum_{s'} P_a(s,s') \left( R_a(s,s') + \gamma V_i(s') \right) \right\}

to converge to $V^*$ . After iterations, compute optimal policy using its definition

\pi^*(s) = \arg\max\limits_a \sum\limits_{s'} P_{s a} (s') V^* (s')