Solved – Standard deviation of normalized data

error-propagationstandard deviation

I have a data set $y_i$ (where the $y_i$ are photon counts in time period $i$, $i=1,2,\ldots,n$, assumed Poisson), with an estimated standard error $s_i$ ($=\sqrt{y_i}$) for each count.

For some reason, I normalize the data set by the average count rate $\bar y$.

$$\bar y = \frac{\sum y_i}{N}$$

where $y_i$ has $i=1,\ldots,N$

The standard deviation of $\bar y$ is calculated as $\sqrt{\frac{\sum (y_i – \bar y)^2}{N-1}}$

To normalize I simply do: $z_i = \frac{y_i}{\bar y}$

At this point how do I calculate the standard error for the $z_i$?

Is error propagation the right thing? $\text{se}(z_i) = z_i \times \sqrt{\left(\frac{s_i}{y_i}\right)^2 + \left(\frac{\text{se}(\bar y)}{\bar y}\right)^2}$

It seems to me that the error stay of the same order as before, while the count rates increase their value about one magnitude. So I am probably missing something.

So, if I print the ratio of the count rate values divided by their standard deviation, before the normalization, I obtain:

[  1.06904497   1.07222193  -0.306786     2.22599555   5.04049535
10.44367859  10.37041246   4.71728177  10.85418506  10.99314159
10.18889392   9.20449287  13.24244513  10.70825227  15.64406957
15.41271307  10.06729494  12.88619079  14.76416192   3.90199486
13.33680974  11.57703491   9.38122633  11.53686373  13.32397254
-0.78308901]

And after the normalization (plus error propagation):

[  2.27976751e+00   2.29330204e+00   1.88194867e-01   9.79928633e+00
4.80280889e+01   1.74659904e+02   1.72699185e+02   4.23542991e+01
1.85693431e+02   1.89443907e+02   1.67854381e+02   1.41988475e+02
2.50472302e+02   1.81762548e+02   3.14047817e+02   3.08069283e+02
1.64619984e+02   2.40831046e+02   2.91120670e+02   2.94284485e+01
2.53020286e+02   2.05263389e+02   1.46572585e+02   2.04172711e+02
2.52673822e+02   1.69702062e+02   2.48534464e+01   1.65477547e+01
4.12770323e+00   4.47260177e-01   1.22474257e+00]

Any suggestion really appreciated.

PS: Sorry if the formulas are not well displayed, I don't know how to set the "formula" environment.

Best Answer

Using the standard rules for error propagation, as mentioned on wikipedia, for each element $i$, the sum can be split into two parts: $a = y_i$ and $b = \sum y - y_i$. Then, ignoring cross-correlation between the elements of $y$:

$$ f(a,b) = a / (a+b) \\ \sigma^2_f = |\frac{df(a,b)}{d a}|^2 \sigma_a^2 + |\frac{df(a,b)}{d b}|^2 \sigma_b^2 \\ \sigma^2_f = (\frac{1}{a+b} - \frac{a}{(a+b)^2})^2 \sigma_a^2 + (\frac{a}{(a+b)^2})^2 \sigma_b^2 $$

Related Question