4.1 Nonparametric methods

In the previous two chapters we considered estimation and testing problems concerning random variables with some known probability distribution (e.g. normal) but where one or more parameters were unknown (e.g. mean and/or variance). The estimation and testing methods we used are called parametric methods. The meaningfulness of the results of a parametric test depends entirely on the validity of the assumptions made about the analytical form of the distribution. However, in real configurations, it is not uncommon for the experimenter to question paramatric assumptions.

Consider a random sample $D_ N \leftarrow \mathbf z$ collected through some experimental observation and for which no hint about the underlying probability distribution $F_{\mathbf z}(\cdot )$ is available. Suppose we want to estimate a parameter of interest $\theta $ of the distribution of $\mathbf z$ by using the estimate $\hat{\theta }=t(D_ N)$. What can we say about the accuracy of the estimator $\hat{\boldsymbol {\theta }}$? As shown in Section 3.5.2, for some specific parameters (e.g. mean and variance) the accuracy can be estimated independently of the parametric distribution. In most cases, however, the assessment of the estimator is not possible unless we know the underlying distribution. What to do, hence, if the distribution is not available? A solution is provided by the so-called nonparametric or distribution-free methods that work independently on any specific assumption about the probability distribution.

The adoption of these methods recently enjoyed a considerable success thanks to the big steps forward made by computers in terms of processing power. In fact, most techniques for nonparametric estimation and testing are based on resampling procedures which require a large number of repeated (and almost similar) computations on the data.

This chapter will deal with two resampling strategies for estimation and for hypothesis testing, respectively.

Jacknife
this approach to nonparametric estimation relies on repeated computations of the statistic of interest for all the combinations of the data where one or more of the original samples are removed. It will be presented in Section 4.3.
Bootstrap
this approach to nonparametric estimation attempts to estimate the sampling distribution of a population by generating new samples by drawing (with replacement) from the original data. It will be introduced in Section 4.4.
Randomization
This is a resampling without replacement testing procedure. It consists in taking the original data and either scrambling the order or the association of the original data. It will be discussed in Section 4.6
Permutation
This is a resampling two-sample hypothesis testing procedure based on repeated permutations of the dataset. It will be presented in Section 4.7.