Ridge regression: Cosmos — All that is, or was, or ever will be

Ridge regression

cosmos 19th January 2018 at 5:08pm

Linear regression with a Tikhonov regularization.

Constant term doesn't contribute to complexity..

Scaling input variables shouldn't change the model complexity, so we normalize them.

Estimate for ridge regression

$w_{ridge} = (X^TX +\lambda I_D)^{-1} X^T y$

easier to solve than normal equation in standard linear regression. lambda is actually a Lagrange multiplier, and dependening on its value we are considering a sphere around the origin. The equation when differentiating w.r.t. lambda gives us that $w^Tw$ is less than a constant (see here

Also called l2 regularization or weight-decay.

In Bayesian terms, this corresponds to a Gaussian priors, and we are using Maximum a-posteriori.