Solved – N-sigma curves for a non-linear least square curve fit

curve fittingleast squaresnonlinear regression

I'm using python's scipy.optimize.curve_fit routine (which uses a non-linear least squares) to fit an exponential function of the form:

f(x) = a * exp(b*x) + c

to a set of data. The result looks like this:

enter image description here

where the black triangles are the data set and the blue curve is the f(x) fitted by the routine. The output of the process includes the optimal value for each parameter a, b, c along with a covariance matrix from where I can obtain the standard deviation for each of them, ie I get:

$a = p_a \pm e_a$

$b = p_b \pm e_b$

$c = p_c \pm e_c$

where $p_i$ are the optimal values and $e_i$ its standard deviations.

What I want is to generate the N-$\sigma$ curves (where N is an integer) for the fit. As an example, here's an article where the authors show a similar exponential best fit (although a decaying one) along with its $1\sigma$ upper and lower curves (see caption):

enter image description here

This site gives an example of how to do this but I'm not sure the procedure is entirely correct.

How can I correctly obtain such curves with the data I have available?

Best Answer

One big problem I see with ordinary least squares is the variance is clearly not constant but increases as $x$ does; I'd be inclined to consider a model where spread is related to mean (possibly proportional to it), suggesting either a generalized nonlinear model or maybe the use of a transformation (such as a log-transform) to stabilize the variance.

Without dealing with that issue, your variance-covariance and standard error estimates are biased.

Since you want to have estimates that are functions of such quantities, addressing this problem will be central to obtaining a usable answer.

In response to comments:

The standard errors of the parameter estimates and the residual standard deviation are built on the assumption of constant variance. This assumption is clearly violated. Consider if (by eye) I draw vertical bars intended to cover about 99.something percent of the local spread at each of several different x-values:

enter image description here

If calculations are based off that equality assumption (as the usual standard errors are), then they will likely be of little value.

The authors in the paper that your second plot there comes from (a page reference would have been nice btw - I am not an astronomer, so I have little idea what's going on in the rest of it) appear to have used a fit where spread is not constant for the y-scale of their plot, and indeed seems to increase in a fashion consistent with their error bars, so it's not clear they necessarily have a problem.

The blog post you link to doesn't look like the plot from the paper; it appears to be using different assumptions. Things that look superficially similar may not be.

If the spread looks roughly constant on the log scale (and there are no actual 0's), then you could fit a nonlinear least squares model of the form $E(y) = \log(a e^{bx}+c)$, and then convert that to a fit on the unstransformed scale - but beware that expectations on the two scales don't directly correspond. If you have near normality on the log-scale you can still get an estimated expectation on the original scale. Failing that you might be left with either making a particular distributional assumption, or a Taylor approximation to the expectation.

Several kinds of intervals

There are several main kinds of intervals people may wish to place on a plot. Here I discuss three:

$\;$ 1. a pointwise (confidence) interval (based on conditional standard error of the mean). This is an interval for where the curve (the conditional mean) may be, for each x-value. That is, specify a particular $x$, and take a vertical slice there. We can produce an estimate of the standard error for the mean at that $x$; a repeated set of such two standard-error intervals* (from different data) at a given $x$ would be expected to include the population mean at that $x$ about 95% of the time if certain assumptions were satisfied.

This seems to be something like what is being attempted in the blog post you mention in your final link, but crucially, looking at the Python code they seem to be ignoring the effect of the parameter covariances, which would make the result wrong.

$\;$ *(We would not say "$2\sigma$" though, since $\sigma$ would conventionally be a population quantity, and this is a sample estimate of something)

$\;$ 2. An overall interval for the mean. This would still be an interval for the mean but one based on global coverage probabilities. (I don't think you're after this.)

$\;$ 3. A (pointwise) prediction interval. This would be an interval with a specified probability of including a new observation at each given $x$. It includes both parameter uncertainty (due to both variances and covariances of parameter estimates) and the observation "noise" (based on a sample estimate of what I'd call $\sigma$). This is what at first glance seems to be going on in the second plot, but I am not 100% certain.

You should clarify what kind of interval you need, and what exactly you mean by "$\sigma$".

Since you aren't really familiar with the usual standard errors and intervals in linear models, let's cover those. (The interval calculations below are all pointwise, not global.)

Standard errors and intervals in linear models

A. Standard error of mean, confidence interval for mean

For a linear model, $y=X\beta+\varepsilon$, $\text{Var}(\varepsilon)=\sigma^2I$, we have for a matrix $C$.

$\text{Var}(C\hat{\beta})=\sigma^2C(X'X)^{-1}C'$.

Hence, if $c$ is a row-vector of constants, $\text{se}(c\hat\beta)=\sigma\sqrt{c(X'X)^{-1}c'}$.

If you estimate $\hat{\sigma^2}$ by $s^2=\frac{1}{n-p}\sum_i e_i^2$ (where $p$ is the number of predictors including the intercept term), then $\frac{c\hat\beta-c\beta}{\text{se}(c\hat\beta)}$ is a pivotal quantity* with a t-distribution with $n-p$ d.f.

A $1-\alpha$ confidence interval for a mean (at $c=x^*$) is therefore $x* \hat\beta \pm t_{1-\alpha/2} s \sqrt{x^* (X'X)^{-1} {x^*}'}$, or more succinctly, $\hat{y} \pm t_{1-\alpha/2} s \sqrt{x^* (X'X)^{-1} {x^*}'}$.

* basically a pivotal quantity is a function of a parameter or parameters and data whose distribution doesn't depend on the parameter(s)

B. Process variation

This is just the estimated standard deviation of the process, $\sigma$, usually estimated by $s$. If we knew the true (conditional) mean at $x^*$, you'd expect about 68% of the observations within $\sigma$ of it, and about 95% within $2\sigma$ of it. The fact that $s$ is estimated alters that somewhat (but I won't labor the point by giving all the details here unless you need them, since this is usually not what people are after).

C. Standard error of prediction, prediction interval for an observation

This incorporates both of the previous two components. Let $y^*$ be an observation not in our sample.

Then the variance of $y^{*}-\hat{y}^{*}$ is $\sigma^2+\sigma^2x^*(X'X)^{-1} {x^*}' = \sigma^2(1+x^* (X'X)^{-1} {x^*}')$, and so the standard error is $\sigma$ $\sqrt{1+x^* (X'X)^{-1} {x^*}'}$

Extending (slightly) the notion of a pivotal quantity, in similar fashion to the confidence interval we can construct a $1-\alpha$ interval for $y^{*}$: $\hat{y}^*$ $\pm$ $t_{1-\alpha/2}$ $s$ $\sqrt{1+x^* (X'X)^{-1} {x^*}'}$

Similar (if approximate) intervals can be fairly easily constructed in nonlinear models; for example, via a Taylor expansion. If you're able to identify which (if any) of the above calculations you need, we may be able to explore how to approach that.

Related Solutions

Solved – Non-linear regression fails to converge, but fit appears good

I will assume the values of all the variables and constants are such that there won't be problems with obtaining square roots of negative numbers. Then

$$\frac{\sqrt{c_1 x + c_2 \exp(x)^y}}{\sqrt{\exp(x)^y}} + c_3 =\sqrt{c_2 + c_1 x \exp(-y x)} + c_3.$$

When $y \gt 0$ then eventually, for sufficiently large $x$, $\frac{c_1}{c_2} x \exp(-y x)$ gets small and the fit can be closely approximated as

$$(\sqrt{c_2} + c_3) + \frac{c_1}{2 c_2} x \exp(-y x).$$

Otherwise, when $y \lt 0$ and $c_1 \ne 0$, eventually (for largish positive $x$) the value of $c_2$ is inconsequential compared to the exponential and the fit simplifies to

$$\sqrt{c_1 x} \exp(-y x / 2) + c_3.$$

In either case there are redundant parameters: $\sqrt{c_2} + c_3$ can be kept constant while varying $c_2$ because $c_1$ can also be varied to keep $\frac{c_1}{2 c_2}$ constant. In the second case $c_2$ can be freely varied throughout a wide range.

Therefore, in many circumstances you should expect not to be able to identify at least one of these parameters. It sounds like your data are in one of these cases. If so, you might consider eliminating one of the $c_i$ from the model altogether.

Alternatively, consider a reparameterization of the function in the form

$$\alpha \sqrt{1 + \beta y x \exp(-y x)} + \gamma - \alpha$$

($\alpha = \sqrt{c_2}$, $\beta = c_1/(c_2 y)$, $\gamma = c_3+\sqrt{c_2}$).

This might provide enough clues to the minimizer to help it converge.

Solved – Total least squares curve fit problem

Curves $y = a + b x + c x^2$ are parabolas that point straight up, so cannot match tilted parabolas (think of a big satellite antenna) no matter how you find $a\ b\ c$.

Edit 9 August: the 45° rotation in the original answer below is wrong, @whuber is right.
Consider noisy parabolas, all with $\bar{X} = \bar{Y} = 0$ and $s_X = s_Y$, tilted at various angles: 0° right, 45°, 90° up ... I see no direct way of finding the tilt. Brute force is klutzy, but may be good enough:

XY -= mean
for angle in e.g. [0 10 20 .. 170]:
    rotate XY by angle
    scale, XY /= std
    TLS: Y = XB + Res
    fit = rms residual_i = |data_i - nearest point on the parabola|
take the best fit.

If more accuracy is needed, use a 1d optimizer such as golden search.
A quite different approach would be to minimize a nonlinear function of a b c and tilt.

In short, the question of how to use TLS to directly fit a noisy tilted parabola seems to me open, even in 2d.

A method that might work well enough for shallow parabolas:

centre the data at [0 0] -- subtract the means
scale X and Y the same -- divide by their sd s
now the 45° line $y = x$ is pretty good line fit; see methods-for-fitting-a-simple-measurement-error-model (or it might be $y = - x$). Rotate the data 45° clockwise, so that a parabola now points up
least-squares fit a parabola, by TLS (or Ordinary ? try-it-and-see).
reverse steps 1 - 3: rotate 45° counterclockwise, unscale, shift back.

Best Answer

Several kinds of intervals

Standard errors and intervals in linear models

Related Solutions

Solved – Non-linear regression fails to converge, but fit appears good

Solved – Total least squares curve fit problem

Related Question