Estimation Theory – Estimating the Variance of Gaussian Distribution from a Noisy Sample

estimationmeasurement errornoise

I have measured a large data sample from an underlying Gaussian distribution and want to estimate the variance and its error. However, the measured values are noisy with some Gaussian noise with a standard deviation that is approximately known. How can I estimate the error of the sample variance in this case? First I computed the mean squared error $$\frac{2}{n-1} \sigma^4$$
which does not take into account the noise and seems to be too small.

What I also do not understand is, how one can compute the MSE if one does not know the true distribution, thus don't know the true $ \sigma$.

Best Answer

I would answer this question through the use of bayesian estimation with a "non-informative" prior.

notation and setup

Data model... $(y_i|\mu,\sigma^2)\sim N(\mu,\sigma^2+\sigma_e^2) $ for $i=1,\dots,n $

Prior... $p (\mu,\sigma^2)\propto(\sigma^2+\sigma_e^2)^{-2} $

The prior is not favouring one source. We have $\frac {\sigma^2}{\sigma^2+\sigma_e^2}\sim U (0,1) $ - the variance proportion is uniformly distributed in the prior. Now all this leads to a marginal posterior for $\sigma^2$ of

$$p (\sigma^2|DI)\propto(\sigma^2+\sigma_e^2)^{-(n+1)/2-1}\exp\left(-\frac {(n-1)s^2}{2(\sigma^2+\sigma_e^2)}\right)$$

This is a "truncated" inverse gamma distribution for $(\sigma^2+\sigma_e^2)$ with shape parameter $\frac {n+1}{2} $ and scale parameter $\frac {(n-1)s^2}{2} $ where $s^2=\frac {1}{n-1}\sum_{i=1}^n (y_i-\overline {y})^2$ and $\overline {y}=\frac {1}{n}\sum_{i=1}^n y_i $. The truncation is that $(\sigma^2+\sigma_e^2)>\sigma_e^2$

how to make use of this

First I would check if the truncation matters by calculating the probability that $(\sigma^2+\sigma_e^2)\leq \sigma_e^2$ using the inverse gamma cdf. A quick "eyeball" check can also be that $s^2>>\sigma_e^2$ (i.e. most of the variation in the data does not comes from measurement error).

If the truncation is not important, we can just use the normal inverse gamma. The expected value of the above posterior is $E(\sigma^2|DI)\approx s^2-\sigma_e^2$ and the variance is given as $$var(\sigma^2|DI)=var(\sigma^2+\sigma_e^2|DI)\approx\frac {2}{n-3}\left [E(\sigma^2+\sigma_e^2|DI)\right]^2\approx\frac {2}{n-3}s^4$$. You need to have $n>3$ to apply these formulae.

This is pretty much your expression but with $\sigma^2$ replaced by $s^2$.

If the truncation matters, the you'll need to work in terms of the incomplete gamma function. If you define the function $f (k)=\gamma \left (\frac {n+1}{2}+k, \frac {(n-1)s^2}{2\sigma_e^2}\right)$ then you have....for the expected value... $$E (\sigma^2|DI)=s^2\frac{n-1}{2}\frac {f(-1)}{f (0)}-\sigma_e^2$$ And the variance is given by $$var (\sigma^2|DI)=\left [E(\sigma^2+\sigma_e^2|DI)\right]^2\left [\frac{f (-2)f (0)}{f (-1)^2} - 1\right]$$

Related Question