Statistical foundations of machine learning

We are in the era of big data. There are essentially two reasons why people gather increasing volumes of data: first, they think some valuable assets are implicitly coded within them, and second computer technology enables effective data storage at reduced costs. The idea of extracting useful knowledge from volumes of data is common to many disciplines, from statistics to physics, from econometrics to system identification and adaptive control. The procedure for finding useful patterns in data is known by different names in different communities but more and more, the set of computational techniques and tools to support the modelling of large amount of data is grouped under the label of machine learning. 

This handbook aims to present the statistical foundations of machine learning intended as the discipline which deals with the automatic design of models from data. In particular, we focus on supervised learning problems, where the goal is to model the relation between a set of input variables, and one or more output  variables, which are considered to be dependent on the inputs in some manner. Since the handbook deals with artificial learning methods, we do not take into consideration any argument of biological or cognitive plausibility of the learning methods we present. Learning is postulated here as a problem of statistical estimation of the dependencies between variables on the basis of data.

This manuscript aims to find a good balance between theory and practice by situating most of the theoretical notions in a real context with the help of practical examples and real datasets. All the examples are implemented in the statistical programming language R. This practical connotation is particularly important since machine learning techniques are nowadays more and more embedded in plenty of technological domains, like bioinformatics, robotics, intelligent control, speech and image recognition, multimedia, web and data mining, computational finance and business intelligence.

We are still reviewing and editing the book. So, feel free to give comments, point to mistakes, inconsistencies or lack of adequate references and make suggestions for improvements on the main page of the book.

Gianluca Bontempi
Souhaib Ben Taieb
September 2013