Partially-observable MDP: Cosmos — All that is, or was, or ever will be

Partially-observable MDP

cosmos 4th November 2016 at 2:43pm

A partially-observabe Markov decision process is a Markov decision process where the state is only partially observable by the actor, so that the policy can only depend on a function of the state, which looses some of the state's Information

Kalman filters and LQG control

video

A type of reinforcement learning, where we don't observe the state explicitly!

Want to estimate actual state, given the noisy and incomplete measurements of the state. Can use the method of marginalization, as used in Factor analysis models. However, it is very computationally expensive. Instead we use a Kalman filter model, which turns out to be a Hidden Markov model with continuous states.

Outline of Kalman filter

Predict step
Update step

Intuition. I think this can be seen through the lens of Sufficient statistics

Kalman filter + LQR = LQG control <- video <– how to solve

Separation principle of LQG control

recap

Other POMDP

In general finding optimal policies of POMDPs is NP hard