This is called the Delta Method.
Suppose that you have some function $y = G(\beta,x) + \epsilon$; note that $G(\cdot)$ is a function of the parameters that you estimate, $\beta$, and the values of your predictors, $x$. First, find the derivative of this function with respect to your vector of parameters, $\beta$: $G^\prime(\beta, x)$. This says, if you change a parameter by a little bit, how much does your function change? Note that this derivative may be a function of your parameters themselves as well as the predictors. For example, if $G(\beta,x) = \exp (\beta x)$, then the derivative is $x \exp (\beta x)$, which depends upon the value of $\beta$ and the value of $x$. To evaluate this, you plug in the estimate of $\beta$ that your procedure gives, $\hat{\beta}$, and the value of the predictor $x$ where you want the prediction.
The Delta Method, derived from maximum likelihood procedures, states that the variance of $G\left(\hat{\beta}, x\right)$ is going to be
$$G^\prime\left(\hat{\beta},x\right)^T \text{Var}\left(\hat{\beta}\right) G^\prime\left(\hat{\beta},x\right),$$
where $\text{Var}\left(\hat{\beta}\right)$ is the variance-covariance matrix of your estimates (this is equal to the inverse of the Hessian---the second derivatives of the likelihood function at your estimates). The function that your statistics packages employs calculates this value for each different value of the predictor $x$. This is just a number, not a vector, for each value of $x$.
This gives the variance of the value of the function at each point and this is used just like any other variance in calculating confidence intervals: take the square root of this value, multiply by the critical value for the normal or applicable t distribution relevant for a particular confidence level, and add and subtract this value to the estimate of $G(\cdot)$ at the point.
For prediction intervals, we need to take the variance of the outcome given the predictors $x$, $\text{Var}(y \mid x) \equiv \sigma^2$, into account. Hence, we must boost our variance from the Delta Method by our estimate of the variance of $\epsilon$, $\hat{\sigma}^2$, to get the variance of $y$, rather than the variance of the expected value of $y$ that is used for confidence intervals. Note that $\hat{\sigma}^2$ is the sum of squared errors (SS
in help file notation) divided by the degrees of freedom (DF
).
In the notation used in the help file above, it looks like their value of c
does not take $\sigma^2$ into account; that is, the inverse of their Hessian is $\sigma^{-2}$ times the one that I give. I'm not sure why they do that. It could be a way of writing the confidence and prediction intervals in a more familiar way (of $\sigma$ times some number times some critical value). The variance that I give is actually c*SS/DF
in their notation.
For example, in the familiar case of linear regression, their c
would be $\left(x^\prime x\right)^{-1}$, while the $\text{Var}\left(\hat{\beta}\right) = \sigma^2 \left(x^\prime x\right)^{-1}$.
Best Answer
Confidence and prediction bands should be expected to typically get wider near the ends - and for the same reason that they always do so in ordinary regression; generally the parameter uncertainty leads to wider intervals near the ends than in the middle
You can see this by simulation easily enough, either by simulating data from a given model, or by simulating from the sampling distribution of the parameter vector.
The usual (approximately correct) calculations done for nonlinear regression involve taking a local linear approximation (this is given in Harvey's answer), but even without those we can get some notion of what's going on.
However, doing the actual calculations is nontrivial and it may be that programs might take a shortcut in calculation which ignores that effect. It's also possible that for some data and some models the effect is relatively small and hard to see. Indeed with prediction intervals, especially with large variance but lots of data it can sometimes be hard to see the curve in ordinary linear regression - they can look almost straight, and it's relatively easy to discern deviation from straightness.
Here's an example of how hard it can be to see just with a confidence interval for the mean (prediction intervals can be far harder to see because their relative variation is so much less). Here's some data and a nonlinear least squares fit, with a confidence interval for the population mean (in this case generated from the sampling distribution since I know the true model, but something very similar could be done by asymptotic approximation or by bootstrapping):
The purple bounds look almost parallel to the blue predictions... but they aren't. Here's the standard error of the sampling distribution of those mean predictions:
which clearly isn't constant.
Edit:
Those "sp" expressions you just posted come straight from the prediction interval for linear regression!