An type of Reinforcement learning in continuous state space
–>using Dynamic programming
State transition probabilites. These matrices can be obtained by Linear regression from samples of the real or simulated dynamics of the system; or they can be a linearization of a non-linear transition function, derived from physics, or other assumptions. This constitutes the linear model neede for LQR
Goal: Finding optimal policy, modelling world as a finite-horizon MDP, which can be solved recursively, using Dynamic programming. It turns out that the optimal action is a linear function of the current state, in this case.
The recursive equation for calculating the optimal value function at time , given its value at time is known as the discrete-time Riccati equation.
Advantage over discretization methods
recap –> some comments, don't need covariance.
Differential dynamic programming (DDP)
Turns out to be a form of local search algorithm