[Math] Standard deviation of mean of a set of numbers, which are imprecise

statisticsstochastic-processes

I have a problem which seems very simple, but for some reason I can not find out what I have to do exactly.

Let's say I have a set of derived values, where each of them has an individual error:
$$X_{all}=(x_1 \pm \sigma_{x_1}, x_2 \pm \sigma_{x_2}, …, x_n \pm \sigma_{x_n})$$
(where $\sigma_i$ stands for the standard deviation).

Now I want to have the average value of $X_{all}$ and some confidence of that value. The average is of course: $X_{avg}=\frac{1}{N}\sum x_i$

But for the standard deviation of $X_{avg}$, I dont know what I should use. There are two possibilities that seem neccesary:

1) From error-propagation: $$\sigma_{X_{avg}} = \sqrt{\frac{1}{N} \sum_i \sigma_{x_i}^2}$$

2) Normal standard-deviation when creating a mean value:
$$\sigma_{X_{avg}} = \sqrt{\frac{1}{N} \sum_i (x_i-X_{avg})^2}$$


Concrete Example:

I perform a measurement to get the value $X$. In order to get a statistical significant knowlegde about $X$ and its standard deviation, I perform the measurement n times, leading to the results $x_i$.

However, I can not measure $x_i$ directly, but only $y_i=x_i+BG$, where BG is a background value. For each measurement of $y_i$, I automatically get the information about $BG$ 1000 times, which gives me $BG_{avg}$ and $\sigma_{BG}$ for each $y_i$ ($BG$ is gauss distributed). Now I have $x_i = y_i – BG_{avg}$, thus I get a $\sigma_{x_i}=\sigma_{BG}$.


Other Concrete Example:

I want to know the average of the lap-time of a racing car. I measure 100 laps. However, I know my clock has an uncertainty $\sigma_{clock}$. Moreover, for some reason I use for each lap a different clock with a different uncertainty $\sigma_{clock_i}$.

So I get 100 times $t_i$ for the time in lap $i$, with an uncertainty of $\sigma_{clock_i}$, corresponding to the uncertainty I introduce due to the clock itself.

What is my uncertainty of the average lap-time of the racing car?

Best Answer

You have defined two distinct standard deviations, describing different things.

The first one describes the standard deviation of the average itself - that is, a measure of how accurately we know the average, ignoring the spread of the set of measurements.

The second one describes the standard deviation of the set of measurements, ignoring the confidence of each measurement.

I assume that what you seek is the standard deviation of the set of all possible sets of measurements, where each measurement has a distribution described by the individual mean and standard deviation. That's a slightly more complicated problem.

Recall that the variance is defined as $$ \text{Var}(X) = E(X^2)-E(X)^2 $$ Now, if $$ X=\frac1N\sum_{i=1}^N X_i $$ where each $X_i\sim \mathcal{N}(x_i,\sigma_i)$, then $E(X)$ is just the average of the $x_i$ values. However, $E(X^2)$ is the average of the expected values of $X_i^2$. And so, we have $$ E(X_i^2) = \text{Var}(X_i)+E(X_i)^2 = \sigma_i^2+x_i^2 $$ and the average value is the sum of the average variance and the average of the $x_i^2$ values.

From here, it is easy to see that the final variance is quite simply the sum of the average of the measurement variances and the variance in the measurement values. That is, taking the square root to get the final standard deviation, $$ \sigma = \sqrt{\frac1N\left(\sum_i \left[\sigma_i^2+(x_i-\mu)^2\right]\right)} $$ where $\mu=\frac1N \sum_i x_i$.

Related Question