Linear quadratic regulation: Cosmos — All that is, or was, or ever will be

Linear quadratic regulation

cosmos 4th November 2016 at 2:43pm

An type of Reinforcement learning in continuous state space

State transition probabilites. These matrices can be obtained by Linear regression from samples of the real or simulated dynamics of the system; or they can be a linearization of a non-linear transition function, derived from physics, or other assumptions. This constitutes the linear model neede for LQR

Reward function

Goal: Finding optimal policy, modelling world as a finite-horizon MDP, which can be solved recursively, using Dynamic programming. It turns out that the optimal action is a linear function of the current state, in this case.

The recursive equation for calculating the optimal value function at time $t$ , given its value at time $t+1$ is known as the discrete-time Riccati equation.

algorithm

Advantage over discretization methods

recap –> some comments, don't need covariance.

Differential dynamic programming (DDP)

Turns out to be a form of local search algorithm

video