Policy evaluation algorithm: Sarsa (a TD algorithm for policy evaluation for the q function, although the whole control algorithm is also known as sarsa).
With Policy iteration, we can do On-policy control with Sarsa. It can be shown to converge if the step sizes are right, using Stochastic approximation theory
n-step Sarsa, between TD and Monte Carlo
We can use elegibility traces to make it into an online algorithm – this is how it is done –> ALGORITHM