# 8 Nonlinear approaches

In the previous chapter we have considered input/output regression problems where the relation between input and output is linear and classification problems where the optimal decision boundaries are linear.

The advantage of linear models are numerous:

the least-squares $\hat{\beta }$ estimate can be expressed in an analytical form and can be easily calculated through matrix computation.

statistical properties of the estimator can be easily defined.

recursive formulation for sequential updating are available.

Unfortunately in real problems it is extremely unlikely that the input and output variables are linked by a linear relation. Moreover, the form of the relation is often unknown and only a limited amount of samples is available. Along the years statisticians and machine learning researchers have proposed a number of nonlinear approaches, with the aim of finding approximators able to combine high generalization with effective learning procedures. The presentation of these techniques could be organized according to several criteria and principles. In this chapter we will focus on the distinction between global and divide-and-conquer approaches.

A family of models traditionally used in supervised learning is the
family of *global models* which describes the relationship between the
input and the output values as a single analytical function over the
whole input domain (Fig.8.1). In
general, this makes sense when it is reasonable to believe that a
physical-like law describes the data over the whole set of operating
conditions.

Examples of well-known global parametric models in literature are the linear models discussed in the previous chapter, generalized linear models and neural networks which will be presented in Section 8.1.1.

A nice property of global modeling is that, even for huge datasets, a parametric model can be stored in a small memory. Moreover, the evaluation of the model requires a short program that can be executed in a reduced amount of time. These features have undoubtedly contributed to the success of the global approach in years when most computing systems imposed severe limitations on users.

However, for a generic global model, the parametric identification (Section 5.2) consists in a nonlinear optimization problem (see Equation 5.2.5) which is not analytically tractable due to the numerous local minima and for which only a suboptimal solution can be found through a slow iterative procedure. Similarly, the problem of selecting the best model structure in a generic nonlinear case cannot be handled in analytical form and requires time consuming validation procedures.

For these reasons, in recent years, alternatives to global modeling
techniques, as the *divide-and-conquer* approach, gained popularity in
the modeling community. The *divide-and-conquer* principle consists in
attacking a complex problem by dividing it into simpler problems whose
solutions can be combined to yield a solution to the original problem.
This principle presents two main advantages. The first is that simpler
problems can be solved with simpler estimation techniques: in
statistical language this means to adopt linear techniques, well studied
and developed over the years. The second is that the learning method can
better adjust to the properties of the available dataset. Training data
are rarely distributed uniformly in the input space. Whenever the
distribution of patterns in the input space is uneven, a proper local
adjustment of the learning algorithm can significantly improve the
overall performance.

We will focus on two main instances of the divide-and-conquer principle: the modular approach, which originated in the field of system identification, and the local modeling approach, which was first proposed in the statistical nonparametric literature.

*Modular* architectures are input/output approximators composed of a
number of modules which cover different regions of the input space. This
is the idea of *operating regimes* which propose a partitioning of the
operating range of the system as a more effective way to solve modeling
problems (Section 8.1.2).

Although these architectures are a modular combination of local models, their learning procedure is still performed on the basis of the whole dataset. Hence, learning in modular architectures remains a functional estimation problem, with the advantage that the parametric identification can be made simpler by the adoption of local linear modules. However, in terms of structural identification the problem is still nonlinear and requires the same procedures used for generic global models.

A second example of divide-and-conquer methods are *local modeling*
techniques (Section 8.1.10), which turn the problem of
function estimation in a problem of value estimation. The goal is not to
model the whole statistical phenomenon but to return the best output for
a given test input, hereafter called the *query*. The motivation is
simple: why should the problem of estimating the values of an unknown
function at given points of interest be solved in two stages? Global
modeling techniques first estimate the function (*induction*) and second
estimate the values of the function using the estimated function
(*deduction*). In this two-stage scheme one actually tries to solve a
relatively simple problem (estimating the values of a function at given
points of interest) by first solving, as an intermediate problem, a much
more difficult one (estimating the function).

Local modeling techniques take an alternative approach, defined as *transduction* by Vapnik [115]
(Fig.8.2). They focus on approximating the function only in the neighborhood of the point to be predicted.

This approach requires to keep in memory the dataset for each prediction, instead of discarding it as in the global modeling case. At the same time, local modeling requires only simple approximators, e.g. constant and/or linear, to model the dataset in a neighborhood of the query point. An example of local linear modeling in the case of a single-input single-output mapping is presented in Fig.8.3.

Many names have been used in the past to label variations of the local
modeling approach: memory-based
reasoning [108],
case-based
reasoning [77], local
weighted
regression [27],
nearest neighbor [30],
just-in-time [32], lazy
learning [3], exemplar-based, instance
based [2],... These approaches are also called
*nonparametric* in the
literature [60,
107], since they relax the assumptions on the
form of a regression function, and let the data search for a suitable
function that describes well the available data.

In the following we will present in detail some machine learning techniques for nonlinear regression and classification.