Q-learning: Cosmos — All that is, or was, or ever will be

Q-learning

cosmos 1st July 2018 at 12:13am

The idea that works best (as of 2016 or so) is Q-learning. Most well-known Q-learning type, where we allow both behaviour and target policies to improve

aka SARSAMAX