Ridge regression

cosmos 19th January 2018 at 5:08pm
Kernel method Linear regression

Linear regression with a Tikhonov regularization.

Constant term doesn't contribute to complexity..

Scaling input variables shouldn't change the model complexity, so we normalize them.

Estimate for ridge regression

via Maximum likelihood

wridge=(XTX+λID)1XTyw_{ridge} = (X^TX +\lambda I_D)^{-1} X^T y

easier to solve than normal equation in standard linear regression. lambda is actually a Lagrange multiplier, and dependening on its value we are considering a sphere around the origin. The equation when differentiating w.r.t. lambda gives us that wTww^Tw is less than a constant (see here

Also called l2 regularization or weight-decay.

In Bayesian terms, this corresponds to a Gaussian priors, and we are using Maximum a-posteriori.