RTDP
Asynchronous dynamic programming which uses On-policy trajectory sampling for choosing the state which are going to be backed up.
RTDP algorithms like (learning real-time A*) are applied to Stochastic optimal path problems