See sec 8.7 of Sutton-Barto
A minimization problem which can be formulated as a Reinforcement learning problem which satisfies these conditions:
- The inital value of every goal state is zero
- there exists at least one policy that guarantees that a goal state will be reached with probability one from any start state
- all rewards for transitions from non-goal states are strictly negative
- all the initial values are equal to, or greater than, their optimal values (which can be satisfied by simply setting the initial values to zero)
Real-time dynamic programming converges for these problems