# 4.1 Nonparametric methods

In the previous two chapters we considered estimation and testing
problems concerning random variables with some known probability
distribution (e.g. normal) but where one or more parameters were unknown
(e.g. mean and/or variance). The estimation and testing methods we used
are called **parametric methods**. The meaningfulness of the results of
a parametric test depends entirely on the validity of the assumptions
made about the analytical form of the distribution. However, in real
configurations, it is not uncommon for the experimenter to question
paramatric assumptions.

Consider a random sample $D_ N \leftarrow \mathbf z$ collected
through some experimental observation and for which no hint about the
underlying probability distribution $F_{\mathbf z}(\cdot )$ is
available. Suppose we want to estimate a parameter of interest $\theta
$ of the distribution of $\mathbf z$ by using the estimate
$\hat{\theta }=t(D_ N)$. What can we say about the accuracy of the
estimator $\hat{\boldsymbol {\theta }}$? As shown in Section
3.5.2, for some specific parameters (e.g. mean and
variance) the accuracy can be estimated independently of the parametric
distribution. In most cases, however, the assessment of the estimator is
not possible unless we know the underlying distribution. What to do,
hence, if the distribution is not available? A solution is provided by
the so-called **nonparametric** or **distribution-free** methods that
work independently on any specific assumption about the probability
distribution.

The adoption of these methods recently enjoyed a considerable success
thanks to the big steps forward made by computers in terms of processing
power. In fact, most techniques for nonparametric estimation and testing
are based on *resampling procedures* which require a large number of
repeated (and almost similar) computations on the data.

This chapter will deal with two resampling strategies for estimation and for hypothesis testing, respectively.

- Jacknife
- this approach to nonparametric estimation relies on repeated computations of the statistic of interest for all the combinations of the data where one or more of the original samples are removed. It will be presented in Section 4.3.
- Bootstrap
- this approach to nonparametric estimation attempts to estimate the sampling distribution of a population by generating new samples by drawing (with replacement) from the original data. It will be introduced in Section 4.4.
- Randomization
- This is a resampling without replacement testing procedure. It consists in taking the original data and either scrambling the order or the association of the original data. It will be discussed in Section 4.6
- Permutation
- This is a resampling two-sample hypothesis testing procedure based on repeated permutations of the dataset. It will be presented in Section 4.7.