Exploration vs exploitationn, in Model-free reinforcement learning
Methods to ensure exploration
- ϵ-greediness
- Exploration bonus (increasing the reward of transitions which are not commonly visited). Like UCB
- Optimistic initialization: initalize the expected value of states to be considerably higher than what one expects, so that the greedy policy tries to visit them, before learning the true values.