On-policy trajectory sampling

cosmos 17th July 2017 at 3:17pm

Reinforcement learning methods which focuses on states and State-action pairs that the agent is likely to encounter when controlling its environment. This can allow computation to skip over parts of the state space that are irrelevant to the prediction or control problem.

It uses sampled trajectories following the policy for choosing the state which are going to be backed up

Real-time dynamic programming uses this idea