10 Feature selection

Supervised learning algorithms techniques were originally not designed to cope with large amounts of irrelevant variables and are known to degrade in performance (prediction accuracy) when faced with many inputs (also known as features) that are not necessary for predicting the desired output.

In the feature selection problem, a learning algorithm is faced with the problem of selecting some subset of features upon which to focus its attention, while ignoring the rest. In many recent challenging problems the number of features amounts to several thousands: this is the case of microarray cancer classification tasks [105] where the number of variables (i.e. the number of genes for which expression is measured) may range from 6000 to 40000. New high-throughput measurement techniques (e.g. in biology) let easily foresee that this number could be rapidly passed by several orders of magnitude.

Using all available features in learning may negatively affect generalization performance, especially in the presence of irrelevant or redundant features. In that sense feature selection can be seen as an instance of model selection problem.

There are many potential benefits of feature selection [56, 57]:

  • facilitating data visualization and data understanding,

  • reducing the measurement and storage requirements,

  • reducing training and utilization times of the final model,

  • defying the curse of dimensionality to improve prediction performance.

At the same time we should be aware that feature selection implies additional time for learning. In fact the search for a subset of relevant features introduces an additional layer of complexity in the modelling task. The search in the model hypothesis space is augmented by another dimension: the one of finding the optimal subset of relevant features.