Stochastic optimal path problem

cosmos 18th July 2017 at 2:40pm

See sec 8.7 of Sutton-Barto

A minimization problem which can be formulated as a Reinforcement learning problem which satisfies these conditions:

  • The inital value of every goal state is zero
  • there exists at least one policy that guarantees that a goal state will be reached with probability one from any start state
  • all rewards for transitions from non-goal states are strictly negative
  • all the initial values are equal to, or greater than, their optimal values (which can be satisfied by simply setting the initial values to zero)

Real-time dynamic programming converges for these problems