Feature selection: Cosmos — All that is, or was, or ever will be

Feature selection

cosmos 15th May 2019 at 3:25am

In feature selection, we would like to select a subset of the features (input variables to a supervised learning algo) that are the most relevant ones for a specific learning problem, so as to get a simpler hypothesis class, and reduce the risk of overfitting. This is useful mostly when we have many features.

What we do is use heuristics to search the huge space.

"Wrapper" feature selection

vid. Feature selections that repeatedly call your learning algo. Work well, but are computationally expensive.

Number of features to include can be chosen by optimizing generalization error (estimated by cross-validation), or by chosen a plausible number..

"Filter" feature selection

vid. Less computationally expensive, but often less effective. For each feature i, we'll compute some measure of how informative x_i is about y, for instance by computing:

Correlation between x_i and y, or
Empirical Mutual information

Learning meaningful representations of the data

Can learn from Unsupervised learning, or Supervised learning algorithms!

See Transfer learning

KnockoffGAN

Restricted Boltzmann machine feature learning

See here