TD(lambda): Cosmos — All that is, or was, or ever will be

TD(lambda)

cosmos 18th July 2017 at 3:10pm

aka TD( $\lambda$ )

Averaging n-step returns is better than just one choice of $n$ . This is what The TD( $\lambda$ ) algorithm achieves, efficiently!

To make computationally efficient, one can update by looking only to the past with backward view of TD lambda algorithm, combines frequency with recency to define Eligibility traces

See notes for more details and proofs of equivalence of the two interpretations.

There are new exactly equivalent online TD(lambda) algorithms!