3.5.3 Bias of the estimator $\hat \sigma^2$

Let us study now the bias of the estimator of the variance of $\mathbf z$.

\begin{align} E_{{\mathbf D}_ N}[\hat{\boldsymbol{\sigma }}^2] & = E_{{\mathbf D}_ N}\left[\frac{1}{N-1} \sum_{i=1}^ N (\mathbf z_ i-\hat{\boldsymbol {\mu }})^2 \right] \tag{3.5.11} \label{eq-biasesi} \\ & =\frac{N}{N-1}E_{{\mathbf D}_ N}\left[ \frac{1}{N} \sum_{i=1}^ N (\mathbf z_ i-\hat{\boldsymbol {\mu }})^2 \right]\\ & =\frac{N}{N-1}E_{{\mathbf D}_ N}\left[\left(\frac{1}{N} \sum_{i=1}^ N \mathbf z^2_ i\right)-\hat{\boldsymbol {\mu }}^2 \right] \end{align}

Since $E[\mathbf z^2]=\mu ^2+\sigma ^2$ and $\text {Cov}[\mathbf z_ i,\mathbf z_ j]=0$, the first term inside the $E[\cdot ]$ is

$$ E_{{\mathbf D}_ N}\left[\left(\frac{1}{N} \sum _{i=1}^ N \mathbf z^2_ i\right)\right]= \frac{1}{N}\sum _{i=1}^ N E_{{\mathbf D}_ N} \left[ \mathbf z^2_ i \right] = \frac{1}{N} N (\mu ^2+\sigma ^2) $$

Since $E\left[\left(\sum _{i=1}^ N \mathbf z_i\right)^2\right]=N^2\mu ^2+N \sigma ^2$ the 2nd term is

$$ E_{{\mathbf D}_ N}[\hat{\boldsymbol {\mu}}^2]=\frac{1}{N^2}E_{{\mathbf D}_ N}\left[\left(\sum _{i=1}^N \mathbf z_ i\right)^2\right]= \frac{1}{N^2} (N^2 \mu ^2+N \sigma ^2)=\mu ^2+ \sigma ^2/N $$

It follows that

\begin{align*} E_{{\mathbf D}_ N}[\hat{\boldsymbol {\sigma}}^2] &=\frac{N}{N-1} \left( (\mu ^2+\sigma ^2)-(\mu ^2+\sigma^2/N) \right) \\ &=\frac{N}{N-1}\left( \frac{N-1}{N} \sigma ^2 \right) = \sigma ^2 \end{align*}

This result justifies our definition (3.3.7). Once the term $N-1$ is inserted at the denominator, the sample variance estimator is not biased.

Some points are worth considering:

  • The results (3.5.9), (3.5.10) and (3.5.11) are independent of the family of the distribution $F(\cdot )$.

  • According to (3.5.10), the variance of $\hat{\boldsymbol {\mu }}$ is $1/N$ times the variance of $\mathbf z$. This is a formal justification of the reason why taking averages on a large number of samples is recommended: the larger $N$, the smaller is $\text {Var}\left[\hat{\boldsymbol {\mu }} \right]$, so a bigger $N$ for a given $\sigma ^2$ implies a better estimate of $\mu $.

  • According to the central limit theorem, under quite general conditions on the distribution $F_{\mathbf z}$, the distribution of $\hat{\boldsymbol {\mu }}$ will be approximately normal as $N$ gets large, which we can write as

    [ \hat{\boldsymbol {\mu }}\sim \mathcal{N}(\mu ,\sigma ^2/N) \quad \text {for } N \rightarrow \infty ]

  • The standard error $\sqrt {\text {Var}\left[\hat{\boldsymbol {\mu }} \right]}$ is a common way of indicating statistical accuracy. Roughly speaking, if the estimator is not biased and the conditions of the central limit theorem apply, we expect $\hat{\boldsymbol {\mu }}$ to be less than one standard error away from $\mu $ about 68$\% $ of the time, and less than two standard errors away from $\mu $ about 95$\% $ of the time (see Table 2.3) .

Script

You can visualize the bias and variance of the sample variance estimator by running the following R script

R code
# script sam_dis2.R
# it visualizes the distribution of the estimator
# of the variance of a gaussian random variable

par(ask=TRUE)
N<-10
mu<-0
sdev<-10
R<-10000

I<-seq(-50,50,by=.5)
p<-dnorm(I,mean=mu,sd=sdev)
plot(I,p,type="l",
     main=paste("Distribution of  r.v. z: var=",sdev^2))

var.hat<-array(0,dim=c(R,1))
std.hat<-array(0,dim=c(R,1))
for (r in 1:R){

  D<-rnorm(N,mean=mu,sd=sdev)
 
  var.hat[r,1]<-var(D)
  std.hat[r,1]<-sd(D)
}

I2<-seq(0,2*sdev^2,by=.5)
hist(var.hat,freq=FALSE,
     main= paste("Variance estimator on N=",N, " samples: mean=",mean(var.hat))) #,xlim=range(I2))


ch<-(var.hat*(N-1))/(sdev^2)
hist(ch,freq=FALSE)
p.var.hat<-dchisq(I2,df=N-1)
lines(I2,p.var.hat,type="l")