Model-free reinforcement learning methods for which the sampling policy is the same as the policy which we are optimizing/evaluating.
These include the basic versions of Monte Carlo learning and Temporal difference learning