Error calculation for a relative difference with only one standard deviation (STD)

data analysiserror analysisMeasurementsstatistics

I have a quantity that is calculated with two measurements, but I only have a tabulated standard deviation for one of the measurements. How would I calculate the error on the calculated quantity?
The form is as follows: $\frac{a – b}{b}$. I have an std on $b$.

The problem is that I'm not sure how the standard deviation of $b$ is calculated. Both $a$ and $b$ are ozone measurements, but one is done with a ground-based instrument and the other is done with a satellite. The ground-based data has a cataloged std. These measurements are daily. The satellite data does not have an error estimate. I wonder if I should calculate my own estimate of the error on the satellite data or only use the cataloged std for the combination (relative difference) of $a$ and $b$? Because of seasonal variations in the data, I don't think I can just use all the data (or even one year) to estimate a standard deviation.

The ultimate goal is to put a $\sigma$ error bar on the relative difference of $a$ and $b$. Right now I'm just using the given std as an error bar and the results look okay, but I can't justify this approach. To give you an idea, my error bars are shown below. I find similar figures in this validation report, but the errors are not discussed in depth.

reldiff_latitude_fig

Any tips are greatly appreciated!

Best Answer

In general, scientific measurements reported without uncertainties are nearly worthless. The exception to that is where you have some good method of trying to estimate what the uncertainties were using the data itself. This is going to be a very data-specific problem and depends exactly on the characteristics of the thing being measured and how it was measured, so no specific answer can be essayed.

Suppose you have two sets of measurements of something but you know that the something does not vary and has a unique value. You could then estimate the uncertainty in the measurements of each of the two datasets by estimating their standard deviations from the individual datasets.

In your case I gather that the thing being measured does vary. In which case, the above method won't work. However, if you knew that the timescale of variation was greater than some timescale $\tau$ then you could get some sort of estimate of the intrinsic standard deviation of the measurements by splitting your dataset into chunks with time windows shorter than $\tau$. Each of these samples would give you an estimate (really an upper limit) of the standard deviation which could then be averaged to give you a final estimate.

You say that the measurements may vary "seasonally" but are taken every day. A more sophisticated approach would be to collect together the day-to-day differences (i.e. the difference between each consecutive pair of points). Under the assumption that these do not vary other than due to experimental errors then you can model the distribution of these differences to estimate the uncertainty in each measurement. This approach often has some intrinsic merit since it can reveal any non-Gaussian behaviour in the uncertainties.

As an aside - if someone presented me with the graph you have shown and told me that the error bars represent the (appropriately combined) errors in the two datasets, I would conclude either: the error bars have been massively overestimated; (ii) the data have been "fixed". The standard deviation of the plotted points should not be many times smaller than the estimated uncertainties in those points.

Related Question