1 Introduction

In recent years, a growing number of organizations have been allocating vast amount of resources to construct and maintain databases and data warehouses. In scientific endeavours, data refers to carefully collected observations about some phenomenon under study. In business, data capture information about economic trends, critical markets, competitors and customers. In manufacturing, data record machinery performances and production rates in different conditions. There are essentially two reasons why people gather increasing volumes of data: first, they think some valuable assets are implicitly coded within them, and computer technology enables effective data storage at reduced costs.

The idea of extracting useful knowledge from volumes of data is common to many disciplines, from statistics to physics, from econometrics to system identification and adaptive control. The procedure for finding useful patterns in data is known by different names in different communities, viz., knowledge extraction, pattern analysis, data processing. More recently, the set of computational techniques and tools to support the modelling of large amount of data is being grouped under the more general label of machine learning [46].

The need for programs that can learn was stressed by Alan Turing who argued that it may be too ambitious to write from scratch programs for tasks that even human must learn to perform. This handbook aims to present the statistical foundations of machine learning intended as the discipline which deals with the automatic design of models from data. In particular, we focus on supervised learning problems (Figure 1.1), where the goal is to model the relation between a set of input variables, and one or more output variables, which are considered to be dependent on the inputs in some manner.

The supervised learning settingFigure 1.1: The supervised learning setting. Machine learning aims to infer from observed data the best model of the stochastic input/output dependency.

Since the handbook deals with artificial learning methods, we do not take into consideration any argument of biological or cognitive plausibility of the learning methods we present. Learning is postulated here as a problem of statistical estimation of the dependencies between variables on the basis of data.

The relevance of statistical analysis arises as soon as there is a need to extract useful information from data records obtained by repeatedly measuring an observed phenomenon. Suppose we are interested in learning about the relationship between two variables $x$ (e.g. the height of a child) and $y$ (e.g. the weight of a child) which are quantitative observations of some phenomenon of interest (e.g. obesity during childhood). Sometimes, the a priori knowledge that describes the relation between $x$ and $y$ is available. In other cases, no satisfactory theory exists and all that we can use are repeated measurements of $x$ and $y$. In this book our focus is the second situation where we assume that only a set of observed data is available. The reasons for addressing this problem are essentially two. First, the more complex is the input/output relation, the less effective will be the contribution of a human expert in extracting a model of the relation. Second, data driven modelling may be a valuable support for the designer also in modelling tasks where he can take advantage of existing knowledge.