The Value function of an Optimal policy in Reinforcement learning.
It is the solution of the Bellman optimality equation, and is unique for finite MDPs.
Bellman optimality equation for (aka Bellman optimality equation, derivation; although I think the way to do it, is to treat first as indepdent of , and then realizing that maximizing over should give (and so should be ):