aka TD()
Averaging n-step returns is better than just one choice of . This is what The TD() algorithm achieves, efficiently!
TD lambda can be done by updating only at the end of the episde, like for Monte Carlo. This isn't computationally efficient
To make computationally efficient, one can update by looking only to the past with backward view of TD lambda algorithm, combines frequency with recency to define Eligibility traces
See notes for more details and proofs of equivalence of the two interpretations.
There are new exactly equivalent online TD(lambda) algorithms!