Finding whether certain objects are present in images, and also where they are in the image.
CS231n Lecture 8 - Localization and Detection
http://zoey4ai.com/2018/05/12/deep-learning-object-detection/
–>Localization as a regression problem
Localizing a fixed number of objects, has been used for human pose estimation
video – efficient implementation
Region proposal networks from Microsoft's ResNet
Problem is that we have variable-sized outputs
Method: sliding window: Use classification on many input regions – use really fast classifier – what about for CNNs (need region proposals so that we don't have to try everywhere!)
Region proposals + CNN classifiers: R-CNN, Fast R-CNNs, and Faster R-CNNs
YOLO: you only look once detection as regression. YOLO is fast!
A MultiPath Network for Object Detection (using Skip-connections)
Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks . Contextual information outside the region of interest is integrated using spatial recurrent neural networks. Inside, we use skip pooling to extract information at multiple scales and levels of abstraction – video
YOLOv3
(see here)
See DenseCap at Image captioning, and its references for more.