Expected Sarsa: Cosmos — All that is, or was, or ever will be

Expected Sarsa

cosmos 15th July 2017 at 8:35pm

Like Sarsa but where the last term of the value of the last visited State-action pair is actually averaged over states, using a target policy (in Off-policy learning)