http://cs231n.github.io/convolutional-networks/
http://cs231n.stanford.edu/syllabus.html
See this review of Geometric deep learning for understanding why CNNs work, and other new related models
[MISS 2016] Andrea Vedaldi - Advanced Convolutional Neural Networks
Spatial transformer network – a variant to allow for more general invariances to spatial transformations of the data
Convolution
Convolutional arithmetic – more explanation here (includes deconv nets)
In The "c1 feature maps" are a set of 2D arrays of neurons. Each array looks for a feature, and a point in the array would represent the location of that feature. To accomplish this, that point of that array is connected to a set of pixels centered in the corresponding point in the input image (an array of pixels). We have much less parameters because for each of these 2D arrays we only specify the parameters for one of the neruons in that array, all other neurons are identical, just connected to displaced sets of pixels.
What is convolution. Correlation. Flip parameters vector (or array..) and rewrite the correlation, we get a convolution. Of course, there's much more to convolutions, including the convolution theorem for e.g.
Stride How much you jump in pixel space (or in previous layer) when you move from one point to another in a feature layer.
Can also expand boundary (zero padding) to keep layer gotten by convolution is of same size as original layer.
Pooling
This is what it does. It downsamples. For memory, and invariance (being more insesitive to perturbations).
We can also apply non-linearities in between layers of course, like for contours enhacement
Use as many of these layers Iconvolutions and poolings) as we can train, 20+ (Deep learning)
At the end we may have a fully connected neural layer, to do the classification, but researchers are questioning if it is that useful..
We may visualize the features in the feature maps by visualizing the matrices of parameters.
MaxPooling coarse grains by chosing the maximum value in a certain patch of a few pixels. Nearby pixels are likely to be similar, and it is useful to distinguish between the same pattern been seen several times by the filter convolving around it, or several distinct repetitions of the pattern. Maxpool tries to solve for the former fake repetition..
Sentence DynConvNet
Document models (Misha Denil)
Visualizing the representation can use t-SNE to visualize
Deep Visualization Toolbox – code
Guided backpropagation is like ReLu backwards, only backpropagating positive gradients.
Combine with grabcut segmentation algorithm to cut out the important part of an image
can do optimization for any neuron. There are better ways of regularizing for images, they do it indirectly (kind of like Dropout), by blurring.
How much can you reconstruct an image from the code
if one specifies the convolutional tensors to be complex wavelet decomposition operators and uses complex modulus as pointwise nonlinearities, one can provably obtain stability to local deformations (17) . Although this stability is not rigorously proved for generic compactly supported convolutional tensors, it underpins the empirical success of CNN architectures across a variety of computer-vision applications [1] . See here .