Risk minimization (see Learning theory), requires knowing the joint probability distribution , so one often uses the sample mean of the risk as an estimator for the expected value of the risk. Minimizing this empirical quantity is called empirical risk minimization (ERM).
Depending on the form of the risk function, this optimization problem may be convex or non-convex. If one uses a 0-1 loss function, the problem is non-convex, and finding its solution is NP-hard. A smoothed loss function may convert it into convex problem, solvable by Gradient descent.
Empirical risk minimization is thus defined as the Optimization problem of minimizing
where is our Model (hypothesis, that depend on the model parameters ; is the Loss function, is the regularizer. is a hyperparameter that balances the two terms.
When we add the regularizer, ERM is called structural risk minimization.