CS231n Lecture 13 - Segmentation, soft attention, spatial transformers
See here for state of the art in semantic segmentation (2019). See also Object detection
ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation
The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation
Convolutional neural networks, with Skip-connections, like Residual neural networks
Densely Connected Convolutional Networks
Semantic Object Parsing With Local-Global Long Short-Term Memory – local guidance from neighboring positions and global guidance from the whole image are imposed on each position to better exploit complex local and global contextual information.
Fully Convolutional Networks for Semantic Segmentation – We show that convolutional networks by themselves, trained end-to-end, pixelsto-pixels, exceed the state-of-the-art in semantic segmentation. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations.
Hypercolumns for Object Segmentation and Fine-grained Localization – Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as a feature representation. However, the information in this layer may be too coarse spatially to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks