Reinforcement learning methods which focuses on states and State-action pairs that the agent is likely to encounter when controlling its environment. This can allow computation to skip over parts of the state space that are irrelevant to the prediction or control problem.
It uses sampled trajectories following the policy for choosing the state which are going to be backed up
Real-time dynamic programming uses this idea