Batch normalization: Cosmos — All that is, or was, or ever will be

Batch normalization

cosmos 13th May 2019 at 7:17pm

A method to increase learning rate for deep neural networks, as well as regularize them.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

A mean field theory of batch normalization

They suggest that a change in the distribution of activations because of parameter updates slows learning. They call this change the internal covariate shift. They propose to normalize the activations (to have zero mean and unit variance) to avoid this.

These have been found to work better for Convolutional neural networks for Image classification. I think this may be because for images, you care more about relative differences between pixel values, and so normalizing helps remove unimportant information.