Policy which maximies the expected reward (quantified by a Value function) over all states in Reinforcement learning. It is any policy which assigns non-zero probability only to greedy actions with respect to the Optimal value function
This is the optimal policy that maximizes the expected total payoff (solution of the optimal policy problem).