Model-free reinforcement learning

cosmos 30th May 2018 at 1:10am

Simple random search provides a competitive approach to reinforcement learning – Our findings contradict the common belief that policy gradient techniques, which rely on exploration in the action space, are more sample efficient than methods based on finite-differences [25, 26]. In more detail, our contributions are as follows:

Prediction

comparing approaches

Evaluation the value function given a policy

Introduction, monte carlo model-free prediction, just sample over runs of the MDP+policy, and average empirical returns (discounted sum of rewards).

Incremental Monte Carlo update

Temporal difference learning

Simple example comparing monte carlo vs TD0

Model-free control (tabular solutions)

intro video!

actually need to use the action-value function to be model-free

We are basically going to use Policy iteration with the Action-value function, with different ways to do the Policy evaluation (by sampling) and policy update step (in a way that explores enough, given that the sampling means we don't see everything). This is an instance Generalized policy iteration with Q function evaluated by sampling (model-free)