Sarsa: Cosmos — All that is, or was, or ever will be

Sarsa

cosmos 15th July 2017 at 8:07pm

Policy evaluation algorithm: Sarsa (a TD algorithm for policy evaluation for the q function, although the whole control algorithm is also known as sarsa).

With Policy iteration, we can do On-policy control with Sarsa. It can be shown to converge if the step sizes are right, using Stochastic approximation theory

n-step Sarsa, between TD and Monte Carlo

We can use elegibility traces to make it into an online algorithm – this is how it is done –> ALGORITHM

intuition for the benefit of Sarsa(lambda)