In feature selection, we would like to select a subset of the features (input variables to a supervised learning algo) that are the most relevant ones for a specific learning problem, so as to get a simpler hypothesis class, and reduce the risk of overfitting. This is useful mostly when we have many features.
What we do is use heuristics to search the huge space.
"Wrapper" feature selection
vid. Feature selections that repeatedly call your learning algo. Work well, but are computationally expensive.
Number of features to include can be chosen by optimizing generalization error (estimated by cross-validation), or by chosen a plausible number..
"Filter" feature selection
vid. Less computationally expensive, but often less effective. For each feature i, we'll compute some measure of how informative xi is about y, for instance by computing:
Can learn from Unsupervised learning, or Supervised learning algorithms!
Restricted Boltzmann machine feature learning
See here