Q-learning

cosmos 1st July 2018 at 12:13am
Model-free control

The idea that works best (as of 2016 or so) is Q-learning. Most well-known Q-learning type, where we allow both behaviour and target policies to improve

friendly intro


aka SARSAMAX