Linear quadratic regulation

cosmos 4th November 2016 at 2:43pm
Reinforcement learning in continuous state space

An type of Reinforcement learning in continuous state space

–>using Dynamic programming

intro

State transition probabilites. These matrices can be obtained by Linear regression from samples of the real or simulated dynamics of the system; or they can be a linearization of a non-linear transition function, derived from physics, or other assumptions. This constitutes the linear model neede for LQR

Reward function

Goal: Finding optimal policy, modelling world as a finite-horizon MDP, which can be solved recursively, using Dynamic programming. It turns out that the optimal action is a linear function of the current state, in this case.

The recursive equation for calculating the optimal value function at time tt, given its value at time t+1t+1 is known as the discrete-time Riccati equation.

algorithm

Advantage over discretization methods

recap –> some comments, don't need covariance.

Differential dynamic programming (DDP)

Turns out to be a form of local search algorithm

video