Optimal policy: Cosmos — All that is, or was, or ever will be

Optimal policy

cosmos 15th July 2017 at 6:55pm

Policy which maximies the expected reward (quantified by a Value function) over all states in Reinforcement learning. It is any policy which assigns non-zero probability only to greedy actions with respect to the Optimal value function

$\pi^*(s) = \arg\max\limits_a \sum\limits_{s'} P_{s a} (s') V^* (s')$

This is the optimal policy that maximizes the expected total payoff (solution of the optimal policy problem).