Solved – Propagation of uncertainty through an average

errorerror-propagationstandard erroruncertainty

I have a set of distance measurements that are all accurate to +/- 0.01 M.

 {1.00,2.00,3,00}

We can obtain the distance moved between measurements by saying {2-1, 3-2} its trivial to see we moved 1M each time.

My question is this. If you want to know the average distance moved, how you you carry the +/- 0.01M through the average.

I would like to report the Average Difference +/- the uncertainty. How do I calculate the uncertainty?

(My real data is more messy than this).

Best Answer

Ok, there are two issues here. The first is the general question of how to use known uncertainty in estimating the mean and variance. The second is the specific issue relating to the fact that you are taking differences.

In general:

In a more general situation, one might have to average a number of measurements each with known standard error $\sigma$. In which case the total variance is the sum of the sample variance and the measurement variance. This is analogous to ANOVA where there is the total variance is the sum of the between groups and within groups variance. Imagine each measurement was actually a little subsample group of repeated measurements, then this is exactly what you would have.

To do this more rigerously, we assuming the following generative model

$X = Z + \epsilon$

where $Z \sim N(\mu, \sigma_Z^2)$ is our "true value distribution" and $\epsilon\sim N(0,\sigma^2_M) $ is our measurement error. We want the tightest bounds on our estimate of $\mu$, calculated by $\hat\mu = \sum_i X_i$

The variance of our estimator is

$Var(\hat\mu) = \frac{\sigma^2_Z+\sigma^2_M}{N}$

where $\sigma^2_Z$ is unknown, and must be estimated from the sample.

One way to estimate this is simply to estimate $\sigma_X^2$ directly in the usual way. The problem with this is that you could get a negative estimate for $\sigma^2_Z$. I think following this route is generally acceptible, assuming that the measurement error is small - or that the measurement error is not accurately known. This corresponds to just ignoring the measurement error and acting as normal, since the measurement error is included in the sample.

However, there must be a better way to estimate $\sigma^2_Z$ from the sample that takes into account the known part of the variance. I shall think on this and come back it I figure it out.

Specific to your example:

In your specific example, you have a slight peculiarity that the average difference does not depend upon the middle measurements, only on the ends.

$\bar{\Delta} = \frac{1}{N}\left[(X_1-X_0) + (X_2-X_1) + \cdots+(X_N-X_{N-1})\right] = \frac{X_N-X_0}{N}$

Put another way, the objects you are averaging are not independent. Since the value of $\bar\Delta$ does not depend on the measurements $[X_1 ... X_{N-1}]$ you can simply plug in the variance estimates in the usual manner.

Assuming that the $X_i$ are independent then

$Var(\bar\Delta) = \frac{Var(X_N) + Var(X_0)}{N^2}$

And you can use the method above to estimate the variance of $X_i$.