Optimal policy

cosmos 15th July 2017 at 6:55pm
Policy Reinforcement learning

Policy which maximies the expected reward (quantified by a Value function) over all states in Reinforcement learning. It is any policy which assigns non-zero probability only to greedy actions with respect to the Optimal value function

Optimal policy

π(s)=argmaxasPsa(s)V(s)\pi^*(s) = \arg\max\limits_a \sum\limits_{s'} P_{s a} (s') V^* (s')

This is the optimal policy that maximizes the expected total payoff (solution of the optimal policy problem).

How to compute the optimal policy