Q function

cosmos 15th July 2017 at 6:43pm
Value function

aka action-value function

Although state-values suffice to define optimality, it will prove to be useful to define action-values. Given a state ss, an action aa and a policy π\pi, the action-value of the pair (s,a)(s,a) under π\pi is defined by

Qπ(s,a)=E[Rs,a,π],Q^\pi(s,a) = E[R|s,a,\pi],\,

where, now, RR stands for the random return associated with first taking action aa in state ss and following π\pi thereafter.

VideoBellman equation for action-value function

It is well-known from the theory of MDPs that if someone gives us QQ for an optimal policy, we can always choose optimal actions (and thus act optimally) by simply choosing the action with the highest value at each state. The action-value function of such an optimal policy is called the optimal action-value function and is denoted by QQ^*.

video, Q function using NN, define loss function, and then use Gradient descent

Can learn the Q function by a dynamic programming approach but it's too computationally expensive. The Model-free reinforcement learning method of Q-learning, on the other hand, is very useful. Don't need to follow optimal policy while Q-learning
