3.5.4 Bias/variance decomposition of MSE

Bias and variance are two independent criteria to assess the quality of an estimator. As shown in Figure 3.3 we could have two estimators behaving in an opposite ways: the first has large bias and low variance, while the second has large variance and small bias. How can we choose among them? We need a measure able to combine or merge the two to a single criteria. This is the role of the mean-square error (MSE) measure.

When $\hat{\boldsymbol {\theta }}$ is a biased estimator of $\theta $, its accuracy is usually assessed by its MSE rather than simply by its variance. The MSE is defined by

$$ \text {MSE}=E_{{\mathbf D}_ N}[(\theta -\hat{\boldsymbol{\theta }})^2] $$

For a generic estimator it can be shown that

\begin{equation} \text {MSE}=(E[\hat{\boldsymbol {\theta}}]-\theta )^2+\text {Var}\left[\hat{\boldsymbol {\theta }}\right]=\left[\text {Bias}[\hat{\boldsymbol {\theta }}]\right]^2+\text {Var}\left[\hat{\boldsymbol {\theta }} \right] \end{equation}

i.e., the mean-square error is equal to the sum of the variance and the squared bias of the estimator . Here it is the analytical derivation

\begin{align} \mbox{MSE}& =E_{{\mathbf D}_ N}[(\theta -\hat{\boldsymbol {\theta }})^2]=E_{{\mathbf D}_ N}[(\theta-E[\hat{\boldsymbol {\theta }}]+E[\hat{\boldsymbol {\theta}}]-\hat{\boldsymbol {\theta }})^2]\\ & =E_{{\mathbf D}_N}[(\theta -E[\hat{\boldsymbol {\theta }}])^2]+ E_{{\mathbf D}_N}[(E[\hat{\boldsymbol {\theta }}]-\hat{\boldsymbol {\theta}})^2] +E_{{\mathbf D}_ N}[2(\theta -E[\hat{\boldsymbol {\theta}}])(E[\hat{\boldsymbol {\theta }}]-\hat{\boldsymbol {\theta}})] \\ & =E_{{\mathbf D}_ N}[(\theta -E[\hat{\boldsymbol{\theta }}])^2]+ E_{{\mathbf D}_ N}[(E[\hat{\boldsymbol {\theta}}]-\hat{\boldsymbol {\theta }})^2] +2(\theta -E[\hat{\boldsymbol{\theta }}])(E[\hat{\boldsymbol {\theta }}]-E[\hat{\boldsymbol{\theta }}])\\ & =(E[\hat{\boldsymbol {\theta }}]-\theta)^2+\text {Var}\left[\hat{\boldsymbol {\theta }} \right] \end{align}

This decomposition is typically called the bias-variance decomposition. Note that, if an estimator is unbiased then its MSE is equal to its variance.