Solved – How to get p value and confidence intervals for nls functions

confidence intervalnlsnonlinear regression

I have 2 questions.

1) How can I have p.value for my 2 functions? My hypothesis is that I have a correlation between my function and my data.

2) How can I have a confidence intervals for my 2 functions?

library(ggplot2)
g <- function (x, a,b,c) a * (1-exp(-(x-c)/abs(b)))
X1 <- c(129.08,109.92,85.83,37.72)
Y1 <- c(0.7,0.5,0.39,-1.36)
dt1 <- data.frame(x1=X1,y1=Y1)
model1 <- nls(Y1 ~ g(X1, a, b, c), 
          start = list(a=0.5, b=60, c=50),control=nls.control(maxiter = 200))

ggplot(data = dt1,aes(x = x1, y = y1)) + 
     theme_bw() + geom_point() + 
     geom_smooth(data=dt1, method="nls", formula=y~g(x, a, b, c),
       se=F, start=list(a=0.5, b=60, c=50))


f <- function (x, a, b, c) a*(x^2)+b*x+c   
X2 <- c(589.62,457.92,370.16,295.98,243.99,199.07,159.91,142.63,
124.15, 101.98, 87.93, 83.16, 82.2, 74.48, 47.68, 37.51, 31,
27.9, 21.24,18.28)
Y2 <- c(0.22,0.37,0.49,0.65,0.81,0.83,1,0.81,0.65,0.44,0.55,0.63,
0.65,0.55,0.37,0.32,0.27,0.22,0.17,0.14)
dt2 <- data.frame(x2=X2,y2=Y2)
model2 <- nls(Y2 ~ f(X2, a, b, c), 
           start = list(a=-1, b=3, c=0),control=nls.control(maxiter = 200))
ggplot(data = dt2,aes(x = x2, y = y2)) + 
      theme_bw() + geom_point() + 
      geom_smooth(data=dt2, method="nls", formula=y~f(x, a, b, c),
       se=F, start=list(a=-1, b=3, c=0))

Thank you in advance

Best Answer

1. - You could try (this is an approximation)

library(nls2)  
summary(as.lm(model))

You can obtain a p-value for all parameters used in your model using

summary(model)
You can get p values for a model by comparing it to another ("nested") model using

anova(model1, model2)

where model 2 is a simplified version of model 1 (it is your null hypothesis)
You can use methods such a bootstrapping, to get a measure of the probability of fit of your complete model.

2.
You can possibly get full model confidence interval using (this is an approximation)

library(nls2) predict(as.lm(model2), interval = "confidence")
You can obtain the confidence interval of the parameters using

confint(model)
You can get more information about these parameter intervals using

profile(model)

plot(profile(model))
You can obtain the pair wise confidence interval for two of your parameters (for both plotting and to get the matrix) using

ellipse.nls(model)

Analysis

According to comments to the question, the "threshold" is determined this way:

I calculate where the derivative of the curve equals the derivative of the secant, which is the straight line formed by joining the lowest and highest points on the curve.

The "lowest" and "highest" points are at the extremes of the $x$ values and

are derived from the data.

Proposed solution

Because the secant is determined by the extremes of $x$, how they are determined is critical.

I therefore suggest estimating the bias in the estimation of the threshold, $t$, as well as a confidence interval for it, using a semi-parametric bootstrap. In this method the model is:

The x values, $x_i$, are drawn independently and randomly from a distribution. (It looks like they tend to lie between $0$ and $1$.) This is the "parametric" part of the bootstrap. We will need to explore how the results depend on assumptions about this distribution.
The y values, $y_i = a x_i^b$, are computed using the fitted parameters $\hat{a}$ and $\hat{b}$ from the original data in place of the unknown values $a$ and $b$, respectively.
To these y values are added "errors" drawn independently and with replacement from the residuals of the original fit. This is the "nonparametric" part of the bootstrap: we have assumed that the vertical deviations of the original data from the correct curve are independent and identically distributed and we have taken the residuals from the original fit to be a representative sample of that distribution.
The model is fit to the $(x_i, y_i)$ pairs and a threshold for them is computed as described previously.

The distribution of these thresholds is the bootstrap distribution. Any deviation between its central location and our original estimate of the threshold, $\hat{t}$, gives us an additive bias correction factor. The spread of this distribution lets us erect approximate confidence intervals around the (bias-corrected) estimate.

The bootstrap distribution is estimated by performing a large (but necessarily finite) number of simulations from the model. Usually a thousand or so iterations are enough to give good information.

Example scenarios

Here are results for three distributions of $X$: the original one (in which the $x_i$ never vary), a uniform distribution, and a Beta distribution that approximates the distribution of the original data. $10^4$ iterations were used.

Histograms

Each plot marks the estimated threshold $\hat{t}$ with a vertical red line. Because that line is centered on the "Original X" histogram, we deduce there is little bias and can take the extreme values of this histogram as approximate confidence limits for $t$: roughly from $0.412$ to $0.433$ (95% confidence).

The second and third plots show how exquisitely sensitive $\hat{t}$ is to the distribution of the $x_i$ and how biased an estimator it might be: notice how much wider the range of threshold is on their horizontal axes. In the second plot, where it is likely the smallest $x_i$ is close to $0$ and the largest $x_i$ might be substantially less than $1$, the reference slope ("secant") is relatively high, pushing the estimated threshold to the left where the curve has greater slopes. The bias is large, approximately $+0.12$, merely because the largest $x_i$ in the dataset is $1$ and the smallest, $0.095$, is fairly far from zero. In the third plot the bias has disappeared (!) but the range of estimated thresholds is still wide. A symmetric 95% confidence interval for $t$ is $[0.33, 0.52]$ (based on $10^4$ iterations). I suspect the lack of bias may be a bit of luck because the $x_i$ are centered fairly close to the original $\hat{t}$. This can be confirmed by drawing the $x_i$ from other distributions, such as a Beta(4,1) distribution (centered near $2/3$): the bias is now $-0.21$.

This is about as far as we can reasonably take the analysis without knowing more about how the $x_i$ are determined. The results are interesting insofar as they reveal how important it is to understand how the data arise: there is a huge range of answers available here, all of which are consistent with the numbers, but only some of which could be appropriate for what actually occurred during this study.

Code

R code to produce these results follows. (Change n.iter to obtain other than $10^3$ iterations.)

y <-c(0.5745373, 0.8255836, 0.7139635, 0.6318004, 0.8688738, 0.8341626, 0.9278573, 0.6548638,
      0.6995722, 0.8410211, 1.0000000, 0.8512973, 0.7917790, 0.5722589, 0.6918069, 0.7117084,
      0.5279689, 0.6121315, 0.5869292, 0.7332605, 0.5991816, 0.5470566, 0.5987166, 0.4440854,
      0.3719892, 0.4394820, 0.4410862, 0.4966288, 0.4291826, 0.4221613, 0.3951223, 0.3595973,
      0.4617549, 0.5106482, 0.5939970, 0.5340835, 0.6036920, 0.5181577, 0.3892170, 0.3667581)

x <- c(0.95149254, 0.57954545, 0.56763699, 0.57089552, 1.00000000, 0.66870629, 0.70833333,
       0.53125000, 0.58776596, 0.78061224, 0.61551724, 0.63750000, 0.85000000, 0.52397260,
       0.66870629, 0.50328947, 0.52192982, 0.40691489, 0.36016949, 0.37500000, 0.35915493, 
       0.53968254, 0.36334197, 0.23486842, 0.19615385, 0.35961538, 0.30039267, 0.26424870,
       0.54838710, 0.23363874, 0.26936620, 0.09514925, 0.15324519, 0.32465278, 0.33909574, 
       0.31587838, 0.24401914, 0.28813559, 0.23181818, 0.29272959)
#
# Fit a curve y = a*x^b to the data using nonlinear least squares,
# compute the "threshold," and return its value.
# (Called by `sim`.)
#
fit <- function(x, y, a=1, b=0.5) {
  m <- nls(y ~ a * x^b, start=list(a=a, b=b))
  coeff <- m$m$getPars()
  f <- function(x) coeff[1] * x^coeff[2]
  fp <- function(x) coeff[1] * coeff[2] * x^(coeff[2]-1)
  x.min <- min(x)
  x.max <- max(x)
  secant <- (f(x.max) - f(x.min)) / (x.max - x.min);
  threshold <- uniroot(function(x) fp(x) - secant, c(0,10^6))$root #$
  return(threshold)
}
#
# Perform a bootstrap simulation with the specified model:
# `n` is the number of iterations
# `f` is the original estimated curve (as a vectorized function)
# `e` is the array of residuals to the initial fit (it defines an empirical
#     distribution of residuals)
# `alpha` and `beta` are parameters for a Beta distribution of the x[i]
# `x` is the array of x[i] to use when not simulating the x[i].
#
sim <- function(n = 1000, f, e, alpha=1, beta=1, x=NULL) {
  replicate(n, {
    if (is.null(x)) {
      x.0 <- rbeta(length(e), alpha, beta)     
    } else {
      x.0 <- x
    }
    y.0 <- f(x.0) + sample(e, length(e), replace=TRUE)
    fit(x.0, y.0)
  })
}
#
# Obtain the initial fit.
#
m1 <- nls(y ~ a * x^b, start=list(a=1, b=0.5))
#
# Compute the residuals `e` and the curve function `f`.
#
e <- residuals(m1)
coeff <- m1$m$getPars()
f <- function(x) coeff[1] * x^coeff[2]
#
# Prepare to bootstrap and plot several scenarios.
# Each scenario should take 2-3 seconds for n.iter=1000.
#
par(mfrow=c(1,3))
n.iter <- 10^3
b <- seq(0.15, 0.65, 0.025)
set.seed(17)
#
# Scenario 1: keep the x values fixed.
#
threshold <- sim(n.iter, f, e, x=x)
hist(threshold, main="Original X", freq=FALSE)
abline(v = fit(x, y), col="Red")
quantile(threshold, c(0.025, 0.975)) # A symmetric CI
#
# Scenario 2: vary the x values uniformly in [0,1].
#
threshold <- sim(n.iter, f, e, alpha=1, beta=1)
hist(threshold, main="X ~ Uniform(0,1)", freq=FALSE, breaks=b)
abline(v = fit(x, y), col="Red")
quantile(threshold, c(0.025,0.5, 0.975))
#
# Scenario 3: vary the x values according to Beta(4,4).
#
threshold <- sim(n.iter, f, e, alpha=4, beta=4)
hist(threshold, main="X ~ Beta(4,4)", freq=FALSE, breaks=b)
abline(v = fit(x, y), col="Red")
quantile(threshold, c(0.025, 0.975))

Best Answer

Related Solutions

Solved – Fitting conditional functions in nls

Solved – How to calculate confidence intervals for power function ($y=ax^b$) in R using NLS

Analysis

Proposed solution

Example scenarios

Code

Related Question