Optimal brain damage. Deletes weights which are expected to reduce the loss function not by much, by computing the Hessian, and seeing how much loss function would change when setting the weight to 0, under the simple quadratic approximation (ok for small weights I guess).
Application to PAC-Bayes Generalization bounds: Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach