Classical decision-time algorithm in Artificial intelligence. For each state encountered, a large tree of possible continuations is considered. The approximate value function is applied to the leaf nodes and then backed up toward the current state at the root.
See sec 8.9 in Sutton-Barto