Empirical risk minimization

cosmos 29th November 2017 at 2:20pm
Learning theory

Risk minimization (see Learning theory), requires knowing the joint probability distribution P(x,y)P(x,y), so one often uses the sample mean of the risk as an estimator for the expected value of the risk. Minimizing this empirical quantity is called empirical risk minimization (ERM).

Depending on the form of the risk function, this optimization problem may be convex or non-convex. If one uses a 0-1 loss function, the problem is non-convex, and finding its solution is NP-hard. A smoothed loss function may convert it into convex problem, solvable by Gradient descent.

Neural networks [2.1] : Training neural networks - empirical risk minimization

Empirical risk minimization is thus defined as the Optimization problem of minimizing

1mil(f(x(i);θ),y(i))+λΩ(θ)\frac{1}{m} \sum\limits_i l(f(x^{(i)}; \mathbf{\theta}), y ^{(i)}) + \lambda \Omega (\mathbf{\theta})

where ff is our Model (hypothesis, that depend on the model parameters θ\mathbf{\theta}; ll is the Loss function, Ω(θ) \Omega (\mathbf{\theta}) is the regularizer. λ\lambda is a hyperparameter that balances the two terms.

When we add the regularizer, ERM is called structural risk minimization.

another video explanationAnother good lecture


Setting up formally a linear classifier, to explore the problems of learning theory, and using empirical risk minimization as learning principle

Intro to ERMdef