Solved – N-sigma curves for a non-linear least square curve fit

curve fittingleast squaresnonlinear regression

I'm using python's scipy.optimize.curve_fit routine (which uses a non-linear least squares) to fit an exponential function of the form:

f(x) = a * exp(b*x) + c

to a set of data. The result looks like this:

enter image description here

where the black triangles are the data set and the blue curve is the f(x) fitted by the routine. The output of the process includes the optimal value for each parameter a, b, c along with a covariance matrix from where I can obtain the standard deviation for each of them, ie I get:

$a = p_a \pm e_a$

$b = p_b \pm e_b$

$c = p_c \pm e_c$

where $p_i$ are the optimal values and $e_i$ its standard deviations.

What I want is to generate the N-$\sigma$ curves (where N is an integer) for the fit. As an example, here's an article where the authors show a similar exponential best fit (although a decaying one) along with its $1\sigma$ upper and lower curves (see caption):

enter image description here

This site gives an example of how to do this but I'm not sure the procedure is entirely correct.

How can I correctly obtain such curves with the data I have available?

Best Answer

One big problem I see with ordinary least squares is the variance is clearly not constant but increases as $x$ does; I'd be inclined to consider a model where spread is related to mean (possibly proportional to it), suggesting either a generalized nonlinear model or maybe the use of a transformation (such as a log-transform) to stabilize the variance.

Without dealing with that issue, your variance-covariance and standard error estimates are biased.

Since you want to have estimates that are functions of such quantities, addressing this problem will be central to obtaining a usable answer.


In response to comments:

The standard errors of the parameter estimates and the residual standard deviation are built on the assumption of constant variance. This assumption is clearly violated. Consider if (by eye) I draw vertical bars intended to cover about 99.something percent of the local spread at each of several different x-values:

enter image description here

If calculations are based off that equality assumption (as the usual standard errors are), then they will likely be of little value.

The authors in the paper that your second plot there comes from (a page reference would have been nice btw - I am not an astronomer, so I have little idea what's going on in the rest of it) appear to have used a fit where spread is not constant for the y-scale of their plot, and indeed seems to increase in a fashion consistent with their error bars, so it's not clear they necessarily have a problem.

The blog post you link to doesn't look like the plot from the paper; it appears to be using different assumptions. Things that look superficially similar may not be.

If the spread looks roughly constant on the log scale (and there are no actual 0's), then you could fit a nonlinear least squares model of the form $E(y) = \log(a e^{bx}+c)$, and then convert that to a fit on the unstransformed scale - but beware that expectations on the two scales don't directly correspond. If you have near normality on the log-scale you can still get an estimated expectation on the original scale. Failing that you might be left with either making a particular distributional assumption, or a Taylor approximation to the expectation.


Several kinds of intervals

There are several main kinds of intervals people may wish to place on a plot. Here I discuss three:

$\;$ 1. a pointwise (confidence) interval (based on conditional standard error of the mean). This is an interval for where the curve (the conditional mean) may be, for each x-value. That is, specify a particular $x$, and take a vertical slice there. We can produce an estimate of the standard error for the mean at that $x$; a repeated set of such two standard-error intervals* (from different data) at a given $x$ would be expected to include the population mean at that $x$ about 95% of the time if certain assumptions were satisfied.

This seems to be something like what is being attempted in the blog post you mention in your final link, but crucially, looking at the Python code they seem to be ignoring the effect of the parameter covariances, which would make the result wrong.

$\;$ *(We would not say "$2\sigma$" though, since $\sigma$ would conventionally be a population quantity, and this is a sample estimate of something)

$\;$ 2. An overall interval for the mean. This would still be an interval for the mean but one based on global coverage probabilities. (I don't think you're after this.)

$\;$ 3. A (pointwise) prediction interval. This would be an interval with a specified probability of including a new observation at each given $x$. It includes both parameter uncertainty (due to both variances and covariances of parameter estimates) and the observation "noise" (based on a sample estimate of what I'd call $\sigma$). This is what at first glance seems to be going on in the second plot, but I am not 100% certain.

You should clarify what kind of interval you need, and what exactly you mean by "$\sigma$".


Since you aren't really familiar with the usual standard errors and intervals in linear models, let's cover those. (The interval calculations below are all pointwise, not global.)

Standard errors and intervals in linear models

A. Standard error of mean, confidence interval for mean

For a linear model, $y=X\beta+\varepsilon$, $\text{Var}(\varepsilon)=\sigma^2I$, we have for a matrix $C$.

$\text{Var}(C\hat{\beta})=\sigma^2C(X'X)^{-1}C'$.

Hence, if $c$ is a row-vector of constants, $\text{se}(c\hat\beta)=\sigma\sqrt{c(X'X)^{-1}c'}$.

If you estimate $\hat{\sigma^2}$ by $s^2=\frac{1}{n-p}\sum_i e_i^2$ (where $p$ is the number of predictors including the intercept term), then $\frac{c\hat\beta-c\beta}{\text{se}(c\hat\beta)}$ is a pivotal quantity* with a t-distribution with $n-p$ d.f.

A $1-\alpha$ confidence interval for a mean (at $c=x^*$) is therefore $x* \hat\beta \pm t_{1-\alpha/2} s \sqrt{x^* (X'X)^{-1} {x^*}'}$, or more succinctly, $\hat{y} \pm t_{1-\alpha/2} s \sqrt{x^* (X'X)^{-1} {x^*}'}$.

* basically a pivotal quantity is a function of a parameter or parameters and data whose distribution doesn't depend on the parameter(s)

B. Process variation

This is just the estimated standard deviation of the process, $\sigma$, usually estimated by $s$. If we knew the true (conditional) mean at $x^*$, you'd expect about 68% of the observations within $\sigma$ of it, and about 95% within $2\sigma$ of it. The fact that $s$ is estimated alters that somewhat (but I won't labor the point by giving all the details here unless you need them, since this is usually not what people are after).

C. Standard error of prediction, prediction interval for an observation

This incorporates both of the previous two components. Let $y^*$ be an observation not in our sample.

Then the variance of $y^{*}-\hat{y}^{*}$ is $\sigma^2+\sigma^2x^*(X'X)^{-1} {x^*}' = \sigma^2(1+x^* (X'X)^{-1} {x^*}')$, and so the standard error is $\sigma$ $\sqrt{1+x^* (X'X)^{-1} {x^*}'}$

Extending (slightly) the notion of a pivotal quantity, in similar fashion to the confidence interval we can construct a $1-\alpha$ interval for $y^{*}$: $\hat{y}^*$ $\pm$ $t_{1-\alpha/2}$ $s$ $\sqrt{1+x^* (X'X)^{-1} {x^*}'}$


Similar (if approximate) intervals can be fairly easily constructed in nonlinear models; for example, via a Taylor expansion. If you're able to identify which (if any) of the above calculations you need, we may be able to explore how to approach that.

Related Question