Policy gradient theorem

cosmos 10th November 2019 at 12:03am
Policy gradient method

Just looking at proof of policy gradient theorem. It's just backpropagation through time on the tree of rollouts, where "activations" correspond to value functions.

https://twitter.com/guillefix/status/1193190931853389825