Solved – How to compute prediction bands for non-linear regression

nonlinear regressionprediction interval

The help page for Prism gives the following explanation for how it computes the prediction bands for non-linear regression. Please excuse the long quote, but I am not following the second paragraph (that explains how $G|x$ is defined and $dY/dP$ is computed). Any help would be greatly appreciated.

The calculation of the confidence and prediction bands are fairly
standard. Read on for the details of how Prism computes prediction and
confidence bands of nonlinear regression.

First, let's define G|x, which is the gradient of the parameters at a
particular value of X and using all the best-fit values of the
parameters. The result is a vector, with one element per parameter.
For each parameter, it is defined as dY/dP, where Y is the Y value of
the curve given the particular value of X and all the best-fit
parameter values, and P is one of the parameters.)

G'|x is that gradient vector transposed, so it is a column rather than
a row of values.

Cov is the covariance matrix (inversed Hessian from last iteration).
It is a square matrix with the number of rows and columns equal to the
number of parameters. Each item in the matrix is the covariance
between two parameters.

Now compute c = G'|x * Cov * G|x. The result is a single number for
any value of X.

The confidence and prediction bands are centered on the best fit
curve, and extend above and below the curve an equal amount.

The confidence bands extend above and below the curve by: =
sqrt(c)*sqrt(SS/DF)*CriticalT(Confidence%, DF)

The prediction bands extend a further distance above and below the
curve, equal to: = sqrt(c+1)*sqrt(SS/DF)*CriticalT(Confidence%,
DF)

Best Answer

This is called the Delta Method.

Suppose that you have some function $y = G(\beta,x) + \epsilon$; note that $G(\cdot)$ is a function of the parameters that you estimate, $\beta$, and the values of your predictors, $x$. First, find the derivative of this function with respect to your vector of parameters, $\beta$: $G^\prime(\beta, x)$. This says, if you change a parameter by a little bit, how much does your function change? Note that this derivative may be a function of your parameters themselves as well as the predictors. For example, if $G(\beta,x) = \exp (\beta x)$, then the derivative is $x \exp (\beta x)$, which depends upon the value of $\beta$ and the value of $x$. To evaluate this, you plug in the estimate of $\beta$ that your procedure gives, $\hat{\beta}$, and the value of the predictor $x$ where you want the prediction.

The Delta Method, derived from maximum likelihood procedures, states that the variance of $G\left(\hat{\beta}, x\right)$ is going to be $$G^\prime\left(\hat{\beta},x\right)^T \text{Var}\left(\hat{\beta}\right) G^\prime\left(\hat{\beta},x\right),$$ where $\text{Var}\left(\hat{\beta}\right)$ is the variance-covariance matrix of your estimates (this is equal to the inverse of the Hessian---the second derivatives of the likelihood function at your estimates). The function that your statistics packages employs calculates this value for each different value of the predictor $x$. This is just a number, not a vector, for each value of $x$.

This gives the variance of the value of the function at each point and this is used just like any other variance in calculating confidence intervals: take the square root of this value, multiply by the critical value for the normal or applicable t distribution relevant for a particular confidence level, and add and subtract this value to the estimate of $G(\cdot)$ at the point.

For prediction intervals, we need to take the variance of the outcome given the predictors $x$, $\text{Var}(y \mid x) \equiv \sigma^2$, into account. Hence, we must boost our variance from the Delta Method by our estimate of the variance of $\epsilon$, $\hat{\sigma}^2$, to get the variance of $y$, rather than the variance of the expected value of $y$ that is used for confidence intervals. Note that $\hat{\sigma}^2$ is the sum of squared errors (SS in help file notation) divided by the degrees of freedom (DF).

In the notation used in the help file above, it looks like their value of c does not take $\sigma^2$ into account; that is, the inverse of their Hessian is $\sigma^{-2}$ times the one that I give. I'm not sure why they do that. It could be a way of writing the confidence and prediction intervals in a more familiar way (of $\sigma$ times some number times some critical value). The variance that I give is actually c*SS/DF in their notation.

For example, in the familiar case of linear regression, their c would be $\left(x^\prime x\right)^{-1}$, while the $\text{Var}\left(\hat{\beta}\right) = \sigma^2 \left(x^\prime x\right)^{-1}$.