TD(lambda)

cosmos 18th July 2017 at 3:10pm
Temporal difference learning

aka TD(λ\lambda)

intro vid

Averaging n-step returns is better than just one choice of nn. This is what The TD(λ\lambda) algorithm achieves, efficiently!

TD lambda can be done by updating only at the end of the episde, like for Monte Carlo. This isn't computationally efficient

To make computationally efficient, one can update by looking only to the past with backward view of TD lambda algorithm, combines frequency with recency to define Eligibility traces

See notes for more details and proofs of equivalence of the two interpretations.

There are new exactly equivalent online TD(lambda) algorithms!