I haven't investigated why the two CIs are unequal because I think both are wrong. The common problem is that the estimated parameters are likely correlated, perhaps heavily so, but neither procedure appears to account for that. (In the realistic examples shown below, the correlation ranges from -99.3% to -99.8%.)
To find confidence bands, start with the model in the alternative form
$$y = \exp(\alpha + \beta \log(x)) + \varepsilon.$$
The error term is $\varepsilon$ and $(\alpha,\beta)$ is the model parameter to be estimated. As in the question I will use nonlinear least squares to fit the model. (Taking logarithms of both sides will be vain, because they cannot simplify the right hand side due to the additive error term. Indeed, as the examples below indicate, it is possible--and perfectly OK--for observed values of $y$ to be negative.)
One output of the fitting procedure will be the variance-covariance matrix of the parameter estimates,
$$\mathbb{V} = \pmatrix{\sigma^2_\alpha & \rho \sigma_\alpha\sigma_\beta \\
\rho \sigma_\alpha\sigma_\beta & \sigma^2_\beta}.$$
The square roots of the diagonal elements, $\sigma_\alpha$ and $\sigma_\beta,$ are the standard errors of the estimated parameters $\hat\alpha$ and $\hat\beta,$ respectively. $\rho$ estimates the correlation coefficient of those estimates.
The predicted value at any (positive) number $x_0$ is
$$\hat{y}(x_0) = \exp(\hat\alpha + \hat\beta\log(x_0)) = e^\hat\alpha x_0^\hat\beta = \frac{\hat A}{\hat\beta} x_0^\hat\beta$$
where $\hat A=\hat\beta e^\hat\alpha,$ showing this really is the intended model as formulated in the question. Taking logarithms (which now is possible because there are no error terms in the equation) gives
$$\log(\hat y(x_0)) = \hat\alpha + \hat\beta \log(x_0) = (1, \log(x_0))\ (\hat\alpha,\hat\beta)^\prime.$$
Thus the standard error of the logarithm of the estimated response is
$$SE(x_0) = SE(\log(\hat y(x_0))) = \sqrt{(1, \log(x_0))\mathbb{V}(1, \log(x_0))^\prime}.$$
Being able to formulate the SE in this way was the whole point of the initial reparameterization of the model: the parameters $\alpha$ and $\beta$ (or, rather, their estimates) enter linearly into the calculation of the standard error of $\hat y.$ There is no need to expand Taylor series or "propagate error."
To construct a $100(1-a)\%$ confidence interval for $\log(y(x_0)),$ do as usual and set the endpoints at $$\log(\hat y(x_0)) \pm Z_{a/2}\, SE(x_0)$$ where $\Phi(Z_{a/2}) = a/2$ is the $a/2$ percentage point of the standard Normal distribution. Because this procedure aims to enclose the true response with $100(1-a)\%$ probability, this coverage property is preserved upon taking antilogarithms. That is,
A $100(1-a)\%$ confidence interval for $y(x_0)$ has endpoints
$$\begin{aligned} &\exp\left(\log(\hat y(x_0)) \pm Z_{a/2}\,
SE(x_0)\right)\\ &= \left[\frac{\hat y(x_0)}{\exp\left(|Z_{a/2}|
SE(x_0)\right)},\ \hat y(x_0)\exp\left(|Z_{a/2}|
SE(x_0)\right)\right]. \end{aligned}$$
By doing this for a sequence of values of $x_0$ you can construct confidence bands for the regression. If all is well (that is, all model assumptions are accurate and there are enough data to assure the sampling distribution of $(\hat\alpha,\hat\beta)$ is approximately Normal), you can hope that $100(1-a)\%$ of these bands envelop the graph of the true response function $x\to \exp(\alpha + \beta\log(x)).$
To illustrate, I generated $20$ datasets having properties similar to those in the problem: the true $\alpha$ and $\beta$ are close to the estimates reported in the question and the error variance $\operatorname{Var}(\varepsilon)$ was set to make the sum of squares of residuals close to that reported in the question (near 90,000). I used the foregoing technique to fit this model to each dataset and then for each one plotted (a) the data, (b) the $90\%$ confidence band and, for reference, (c) the graph of the true response function. The latter is colored red wherever it lies beyond the confidence band. The test of this approach is that about $10\%,$ or two of the $20,$ panels ought to show red portions: and that's exactly what happened (in iterations 10 and 16).
![Figure](https://i.stack.imgur.com/nmh3r.png)
For details, consult this R
code that generated and plotted the simulations.
#
# Describe the model and the data.
#
x <- seq(10, 100, length.out=51)
alpha <- log(7.5)
beta <- 0.6
sigma <- sqrt(90000 / length(x)) # Error SD
a <- 0.10 # Test level
nrow <- 4 # Rows for the simulation
ncol <- 5 # Columns for the simulation
x.0 <- seq(min(x)*0.5, max(x)*1.1, length.out=101) # Prediction points
f <- function(x, theta) exp(theta[1] + theta[2]*log(x))
X <- data.frame(x=x, y.0=f(x, c(alpha, beta)))
#
# Create the datasets.
#
set.seed(17)
data.lst <- lapply(seq_len(nrow*ncol), function(i) {
X$y <- X$y.0 + rnorm(nrow(X), 0, sigma)
X$Iteration <- i
X
})
#
# Fit the model to the datasets.
#
Z <- qnorm(a/2) # For computing a 100(1-a)% confidence band
results.lst <- lapply(seq_along(data.lst), function(i) {
#
# Fit the data.
#
X <- data.lst[[i]]
fit <- nls(y ~ f(x, c(alpha, beta)), data=X, start=c(alpha=0, beta=0))
print(fit) # (Optional)
#
# Compute the SEs for log(y).
#
V <- vcov(fit)
se2 <- sapply(log(x.0), function(xi) {
u <- c(1, xi)
u %*% V %*% u
})
se <- sqrt(se2)
#
# Compute the CIs.
#
y <- log(f(x.0, coefficients(fit))) # The estimated log responses at the prediction points
data.frame(Iteration = i,
x = x.0,
y = exp(y),
y.lower = exp(y + Z * se),
y.upper = exp(y - Z * se))
})
#
# Plot the results.
#
X <- do.call(rbind, data.lst)
Y <- do.call(rbind, results.lst)
Y$y.0 <- f(Y$x, c(alpha, beta)) # Reference curve
library(ggplot2)
ggplot(Y, aes(x)) +
geom_ribbon(aes(ymin=y.lower, ymax=y.upper), fill="Gray") +
geom_line(aes(y=y.0, color=(y.lower <= y.0 & y.0 <= y.upper)), size=1,
show.legend=FALSE) +
geom_point(aes(y=y), data=X, alpha=1/4) +
facet_wrap(~ Iteration, nrow=nrow) +
ylab("y") +
ggtitle(paste0("Simulated datasets and ", 100*a, "% bands"),
"True model shown (in color) for reference")
Assessing the hypothesis that $a$ and $b$ are different is equivalent to testing the null hypothesis $a - b = 0$ (against the alternative that $a-b\ne 0$).
The following analysis presumes it is reasonable for you to estimate $a-b$ as $$U = \hat a - \hat b.$$ It also accepts your model formulation (which often is a reasonable one), which--because the errors are additive (and could even produce negative observed values of $y$)--does not permit us to linearize it by taking logarithms of both sides.
The variance of $U$ can be expressed in terms of the covariance matrix $(c_{ij})$ of $(\hat a, \hat b)$ as
$$\operatorname{Var}(U) = \operatorname{Var}(\hat a - \hat b) = \operatorname{Var}(\hat a) + \operatorname{Var}(\hat b) - 2 \operatorname{Cov}(\hat a, \hat b) = c_{11} + c_{22} - 2c_{12}^2.$$
When $(\hat a, \hat b)$ is estimated with least squares, one usually uses a "t test;" that is, the distribution of $$t = U / \sqrt{\operatorname{Var(U)}}$$ is approximated by a Student t distribution with $n-2$ degrees of freedom (where $n$ is the data count and $2$ counts the number of coefficients). Regardless, $t$ usually is the basis of any test. You may perform a Z test (when $n$ is large or when fitting with Maximum Likelihood) or bootstrap it, for instance.
To be specific, the p-value of the t test is given by
$$p = 2t_{n-2}(-|t|)$$
where $t_{n-2}$ is the Student t (cumulative) distribution function. It is one expression for the "tail area:" the chance that a Student t variable (of $n-2$ degrees of freedom) equals or exceeds the size of the test statistic, $|t|.$
More generally, for numbers $c_1,$ $c_2,$ and $\mu$ you can use exactly the same approach to test any hypothesis
$$H_0: c_1 a + c_2 b = \mu$$
against the two-sided alternative. (This encompasses the special but widespread case of a "contrast".) Use the estimated variance-covariance matrix $(c_{ij})$ to estimate the variance of $U = c_1 a + c_2 b$ and form the statistic
$$t = (c_1 \hat a + c_2 \hat b - \mu) / \sqrt{\operatorname{Var}(U)}.$$
The foregoing is the case $(c_1,c_2) = (1,-1)$ and $\mu=0.$
To check that this advice is correct, I ran the following R
code to create data according to this model (with Normally distributed errors e
), fit them, and compute the values of $t$ many times. The check is that the probability plot of $t$ (based on the assumed Student t distribution) closely follows the diagonal. Here is that plot in a simulation of size $500$ where $n=5$ (a very small dataset, chosen because the $t$ distribution is far from Normal) and $a=b=-1/2.$
![Probability plot](https://i.stack.imgur.com/xl3rl.png)
In this example, at least, the procedure works beautifully. Consider re-running the simulation using parameters $a,$ $b,$ $\sigma$ (the error standard deviation), and $n$ that reflect your situation.
Here is the code.
#
# Specify the true parameters.
#
set.seed(17)
a <- -1/2
b <- -1/2
sigma <- 0.25 # Variance of the errors
n <- 5 # Sample size
n.sim <- 500 # Simulation size
#
# Specify the hypothesis.
#
H.0 <- c(1, -1) # Coefficients of `a` and `b`.
mu <- 0
#
# Provide x and z values in terms of their logarithms.
#
log.x <- log(rexp(n))
log.z <- log(rexp(n))
#
# Compute y without error.
#
y.0 <- exp(a * log.x + b * log.z)
#
# Conduct a simulation to estimate the sampling distribution of the t statistic.
#
sim <- replicate(n.sim, {
#
# Add the errors.
#
e <- rnorm(n, 0, sigma)
df <- data.frame(log.x=log.x, log.z=log.z, y.0, y=y.0 + e)
#
# Guess the solution.
#
fit.ols <- lm(log(y) ~ log.x + log.z - 1, subset(df, y > 0))
start <- coefficients(fit.ols) # Initial values of (a.hat, b.hat)
#
# Polish it using nonlinear least squares.
#
fit <- nls(y ~ exp(a * log.x + b * log.z), df, list(a=start[1], b=start[2]))
#
# Test a hypothesis.
#
cc <- vcov(fit)
s <- sqrt((H.0 %*% cc %*% H.0))
(crossprod(H.0, coef(fit)) - mu) / s
})
#
# Display the simulation results.
#
summary(lm(sort(sim) ~ 0 + ppoints(length(sim))))
qqplot(qt(ppoints(length(sim)), df=n-2), sim,
pch=21, bg="#00000010", col="#00000040",
xlab="Student t reference value",
ylab="Test statistic")
abline(0:1, col="Red", lwd=2)
Best Answer
These models are not the same, because the first is a rational function of $x$ and the second is an exponential function. The second one is truly a "logistic" model but the first is not. Moreover, the claims about $B$ and $C$ in the first model are not true. The maximum slope depends on $C$, which is a scale parameter, not a location parameter. Moreover, although the maximum slope does depend on $B$, it does so in a complicated fashion. For instance, when $B = 2$, the maximum slope equals $\frac{9}{8\sqrt{3}}$.
If you write $x = \log(z)$ and $C = \log(\gamma)$ then the second model is
$$y = D + \frac{A-D}{1 + \exp(B(\log(z)-\log(\gamma))} = D + \frac{A-D}{1 + (z/\gamma)^B},$$
which does have the form of the first model. In other words, in the first model the logarithm of $z$ has a logistic form.