Zero-shot learning

cosmos 1st September 2017 at 6:23pm
Learning

In zero-shot visual recognition, the semantic gap is the biggest hurdle; i.e., the distribution of instances in visual space is often distinct from that of their underlying semantics in semantic space as visual features in various forms may convey the same concept. This semantic gap results in a great difficulty in transferring knowledge on known classes to unseen classes. Apart from the semantic gap issue, the hubness (Radovanovi´c et al. 2010) is recently identified as a cause that accounts for the poor performance of most existing ZSL models (Dinu et al. 2015; Shigeto et al. 2015; Xu et al. 2015b). “Hubness” refers to the phenomenon that some instances (referred to as hubs) in the high-dimensional space appear to be the nearest neighbors of a large number of instances. When nearest-neighbour based algorithms are applied, test instances are likely to be close to those “hubs” regardless of their labels and hence incorrectly labeled as labels of “hubs”. In ZSL, the “hubness” phenomenon becomes more severe. Apart from the intrinsic property of high-dimensional space (Radovanovi´c et al. 2010), the hubness is exacerbated by a lack of training instances belonging to unseen classes in visual domain and the domain shift problem, where the distribution of training data is different from that of test data, which often occurs in ZSL (Fu et al. 2015; Zhang and Saligrama 2016b).

See also One-shot learning

Zero-Shot Visual Recognition via Bidirectional Latent Embedding

The motivation behind our proposed framework is twofold:

(a) to narrow the semantic gap, a latent space is learned from visual representations of training data in a supervised manner by preserving intrinsic structures underlying visual data and promoting the discriminative capability simultaneously. This is done through a supervised version of the Dimensionality reduction technique Locality preserving projection (supervised locality preserving projection (SLPP) (Cheng et al. 2005)) on the high-dimensional features (from Convolutional neural network for e.g.)

and (b) for knowledge transfer, the semantic representations of unseen-class labels are then embedded into the learned latent space of favorable properties by taking into account both the embedding of training-class labels and the semantic relatedness between all different classes; i.e., not only the relationships between known and unseen classes but also that between unseen classes. For this they introduce a version of Sammon mapping (Sammon 1969), named landmark-based Sammon mapping (LSM)