Used in different ways in different fields/contexts
methods that given a model of the environment, it tries to find optimal policies (depends on definition of model..)
Solving the RL (prediction/control) problem in the context of a fully-known of MDP. It also refers to approximate solution of the problem. See Model-based reinforcement learning
Neural Mechanisms of Hierarchical Planning in a Virtual Subway Network
See Sutton-Barto, chapter 8.
–> Model-free reinforcement learning methods can be used for planning, when trained using simulated experience
When planning occurs in parallel and in a sense independently of other learning/acting/decisions processes.
Planning which occurs when going to decide which action to take every time a new state is visited. The backups performed focus forward from states which most affect the current state (i.e. those that are likely to occur in the future).