Model-based reinforcement learning

cosmos 15th May 2019 at 2:22am
Reinforcement learning

aka Planning

Solving the Bellman equations

https://worldmodels.github.io

https://twitter.com/wgussml/status/1126984030090596354

World models for atari: https://arxiv.org/pdf/1903.00374.pdf . they train the world model on trajectories that the agent actually explores, rather than random ones. And they iterate several times

Linear programming

Dynamic programming

ideaTradeoffs. The idea is to solve consistency equations (derived by a look ahead tree and principle of optimality) iteratively (see Fixed-point iteration). – Summary of methods

Neuro-dynamic programming

Policy iteration

Value iteration

Extensions to dynamic programming:

There are other algorithms described in the Wiki page:

  • Trust Region Policy Optimization [1]
  • Proximal Policy Optimization (i.e., TRPO, but using a penalty instead of a constraint on KL divergence), where each subproblem is solved with either SGD or L-BFGS
  • Cross Entropy Method
final comment on DP methods, DP uses full-width look ahead, plus it assumes we know dynamics. Instead we can sample) –> leads to Model-free reinforcement learning

Asynchronous DP

Real-time dynamic programming (RTDP), which uses On-policy trajectory sampling


Combining model-free with model-based RL

https://deepmind.com/blog/agents-imagine-and-plan/Learning model-based planning from scratch

Using Generative models, and Environment models

Empowerment