aka Planning
Solving the Bellman equations
https://twitter.com/wgussml/status/1126984030090596354
World models for atari: https://arxiv.org/pdf/1903.00374.pdf . they train the world model on trajectories that the agent actually explores, rather than random ones. And they iterate several times
idea – Tradeoffs. The idea is to solve consistency equations (derived by a look ahead tree and principle of optimality) iteratively (see Fixed-point iteration). – Summary of methods
Extensions to dynamic programming:
There are other algorithms described in the Wiki page:
final comment on DP methods, DP uses full-width look ahead, plus it assumes we know dynamics. Instead we can sample) –> leads to Model-free reinforcement learning |
Real-time dynamic programming (RTDP), which uses On-policy trajectory sampling
https://deepmind.com/blog/agents-imagine-and-plan/ – Learning model-based planning from scratch
Using Generative models, and Environment models