Stochastic optimal path problem: Cosmos — All that is, or was, or ever will be

Stochastic optimal path problem

cosmos 18th July 2017 at 2:40pm

See sec 8.7 of Sutton-Barto

A minimization problem which can be formulated as a Reinforcement learning problem which satisfies these conditions:

The inital value of every goal state is zero
there exists at least one policy that guarantees that a goal state will be reached with probability one from any start state
all rewards for transitions from non-goal states are strictly negative
all the initial values are equal to, or greater than, their optimal values (which can be satisfied by simply setting the initial values to zero)