Rollout algorithm

cosmos 17th July 2017 at 3:30pm
Decision-time planning

Decision-time planning where the value of possible actions is estimated by Monte Carlo sampling of future trajectories, for each of which we compute the cumulative reward (see Reinforcement learning), and averaging. We then often choose the most rewarding action (but with some probability we may not to allow some exploration.

Monte Carlo tree search is an example of this.