Exploration-exploitation trade-off

cosmos 18th July 2017 at 2:25pm
Reinforcement learning

Exploration vs exploitationn, in Model-free reinforcement learning


Methods to ensure exploration

  • ϵ\epsilon-greediness
  • Exploration bonus (increasing the reward of transitions which are not commonly visited). Like UCB
  • Optimistic initialization: initalize the expected value of states to be considerably higher than what one expects, so that the greedy policy tries to visit them, before learning the true values.