Policy gradient theorem: Cosmos — All that is, or was, or ever will be

Policy gradient theorem

cosmos 10th November 2019 at 12:03am

Just looking at proof of policy gradient theorem. It's just backpropagation through time on the tree of rollouts, where "activations" correspond to value functions.

https://twitter.com/guillefix/status/1193190931853389825