Solved – Error bars, linear regression and “standard deviation” for point

inverse-predictionregressionstandard deviation

I have a set of experimental data points. I performed the measurements in triplicate, for each of the point of the data set. Therefore, I can draw each data point with the standard deviation of each triplicate. See the picture attached.

In experimental sciences, it is common to report a value with its standard deviation. Ex: a mean, +/- the std.

I can calculate a linear regression for the data set. If I have the equation of the linear regression, I can calculate x for any y. Let's take y=50 -> x=11.69

Now, is there a way to evaluate the "dispersion" of this extrapolated point ? Something like 11.69 +/- something.

I know it should be the other way around, like 50 +/- something for x=11.69, but then I could use the equation to transform it to x.

Basically what I'm asking: is there a global "standard deviation" for a complete linear regression ?

EDIT:
When I say "any y", I mean that y will not be an experimental value. I choose it to be 50.

Best Answer

Your dream of a "global SD" for estimated errors in $x$ given a value of $y$ is not possible.

If what you care about is the SD of a prediction of $x$ given a value of $y$, then what you should examine is the square root of equation (10) of the linked reference. The same result is provided in equation 5.25 of an online analytical chemistry textbook that I find more generally useful and of whose content you should, as a chemist, be aware. Say that you generate a standard curve with known values of $x$ and measured values of $y$. The slope of the standard curve by linear regression was $\beta_1$, and the standard deviation about the regression for the standard curve was: $$s_r=\sqrt{\frac{\sum_i(y_i-\hat{y_i})^2}{n-2}}$$ where $y_i$ are the individual observed values in the standard curve, $\hat{y_i}$ are the corresponding individual predicted values from the regression and $n$ is the number of observations making up the standard curve.

You then make $m$ subsequent measurements of $y$ on a sample with unknown $x$ to estimate that unknown value of $x$, obtaining a mean value $\overline{Y}$. The standard deviation of the estimated value in $x$ based on this value of $\overline{Y}$ is then:

$$s_x=\frac{s_r}{\beta_1}\sqrt{\frac{1}{m}+\frac{1}{n}+\frac{(\overline{Y}-\bar{y})^2}{\beta_1^2\sum_i(x_i-\bar{x})^2}}$$

In this equation individual $x$ values for generating the standard curve were $x_i$ with mean value $\bar{x}$; the corresponding $y$ values for the standard curve had mean value $\bar{y}$.

Note that this standard deviation increases as the observed mean value $\overline{Y}$ for the unknown sample moves away from the mean value $\bar{y}$ determined when generating the standard curve. There thus is no global SD value for all $x$ predicted on the basis of measuring $y$. It must be calculated anew for each unknown sample. You will obtain the most precise results if your unknowns are close to the mean value of the standards used to generate the standard curve.

Happily, there is an R package chemCal that can perform all these calculations in even a more general setting where different observations have different weights. The function of interest in that package is inverse.predict.

Best Answer

Related Solutions

Solved – Error bars using median absolute deviation

Solved – Standard deviation non centered (normal) data

Related Question