Problem set up, focus on continuous state space, but discrete action set. Need a model, which can be theoretically derived, or learnt.
Learning algorithm, assuming we can approximate the value function as a linear function of a set of features of the state (as in Linear regression)