Exploration-exploitation trade-off: Cosmos — All that is, or was, or ever will be

Exploration-exploitation trade-off

cosmos 18th July 2017 at 2:25pm

Methods to ensure exploration

$\epsilon$ -greediness
Exploration bonus (increasing the reward of transitions which are not commonly visited). Like UCB
Optimistic initialization: initalize the expected value of states to be considerably higher than what one expects, so that the greedy policy tries to visit them, before learning the true values.