Just looking at proof of policy gradient theorem. It's just backpropagation through time on the tree of rollouts, where "activations" correspond to value functions.
https://twitter.com/guillefix/status/1193190931853389825