4.2 Estimation of arbitrary statistics

Consider a set $D_ N$ of $N$ data points sampled from a scalar r.v. $\mathbf z$. Suppose that our quantity of interest is the mean of $\mathbf z$. It is straightforward to derive the bias and the variance of the estimator $\hat{\boldsymbol {\mu }}$:

$$ \hat{\mu }=\frac{1}{N} \sum _{i=1}^ N z_ i, \qquad \text{Bias}[\hat{\boldsymbol {\mu }}]=0, \qquad \text{Var}\left[\hat{\boldsymbol {\mu }} \right]=\frac{\sigma ^2}{N} $$

Consider now another quantity of interest, for example the median or a mode of the distribution. While it is easy to design a plug-in estimate of these quantities, their accuracy is difficult to be computed. In other terms, given an arbitrary estimator $\hat{\boldsymbol {\theta }}$, the analytical form of the variance $\text {Var}\left[\hat{\boldsymbol {\theta }} \right]$ and the bias $\text {Bias}[\hat{\boldsymbol {\theta }}]$ is typically not available.


Suppose we want to estimate the skewness) of a random variable $\mathbf z$ on the basis of a collected dataset $D_ N$. A plug-in estimator could be

$$ \hat{\gamma }=\frac{\frac{1}{N} \sum _ i(\mathbf z_i-\hat{\mu })^3}{\hat{\sigma }^3} $$

What about the accuracy (e.g. bias, variance) of this estimator?

$\bullet $


Let us consider an example of estimation taken from a medical experimental study [41]. The goal of the study is to show bioequivalence between an old and a new version of a patch designed to infuse a certain hormone in the blood. Eight subjects take part to the study. Each subject has his hormone levels measured after wearing three different patches: a placebo, an “old” patch and a “new” patch. It is established by the Food and Drug Administration (FDA) that the new patch will be approved for sale only if the new patch is bioequivalent to the old one according to the following criterion:

\begin{equation} \theta =\frac{|E(\mbox{new})-E(\mbox{old})|}{E(\mbox{old})-E(\mbox{placebo})}\le 0.2 \tag{4.2.1} \label{eq:FDA} \end{equation}

Let us consider the following plug-in estimator of \eqref{eq:FDA}

$$ \hat{\boldsymbol {\theta }}=\frac{|\hat{\boldsymbol {\mu}}_{\mbox{new}}-\hat{\boldsymbol {\mu}}_{\mbox{old}}|}{\hat{\boldsymbol {\mu}}_{\mbox{old}}-\hat{\boldsymbol {\mu }}_{\mbox{placebo}}} $$

Suppose we have collected the following data (details in [41])

subj plac old new z=old-plac y=new-old
1 9243 17649 16449 8406 -1200
2 9671 12013 14614 2342 2601
3 11792 19979 17274 8187 -2705
8 18806 29044 26325 10238 -2719
mean:       6342 -452.3

The estimate is

$$ \hat{\theta }=t(\hat{F})=\frac{|\hat{\mu}_{\mbox{new}}-\hat{\mu }_{\mbox{old}}|}{\hat{\mu}_{\mbox{old}}-\hat{\mu }_{\mbox{placebo}}}=\frac{|\hat{\mu}_{y}|}{\hat{\mu }_{z}}=\frac{452.3}{6342}=0.07 $$

Can we say on the basis of this value that the new patch satisfies the FDA criterion in \eqref{eq:FDA} ? What about the accuracy, bias or variance of the estimator? The techniques introduced in the following sections may provide an answer to these questions.

$\bullet $