Solved – R – how to get standard error for a breakpoint/parameter in a piecewise regression

jagsnlspiecewise linearsegmented regressionstandard error

I'm new to asking questions here, so I hope this is a reasonable place for this.

I am trying to fit a piece-wise regression. I expect my response (y) to increase from x0 to somewhere along x, then to plateau after a breakpoint; thereafter, slope =0. The model I would like to fit has 3 parameters — intercept, slope, and breakpoint. In a best-case scenario, I would compare this model to an intercept-only/null model using AIC or some similar method.

I started with package segmented, with this hint to fix the second slope to 0:
https://stat.ethz.ch/pipermail/r-help/2007-July/137625.html
But after updating a bunch of packages, the whole thing (which was already a bit fussy) stopped working altogether.

So, now, I'm using optim(), but I'm not sure how to appropriately get an estimate of error, particularly around the breakpoint parameter. See code below for examples of attempts using bblme and nls (they both work, but both seem to fail on my data sets).

According to a SAS-using colleague, proc nlin seems to be able to fit this model easily & generates standard error estimates for the parameters… but, I'm determined to Not use SAS.

This exchange (and others) seems to indicate that there are general concerns with error estimates for this type of model,particularly the breakpoint parameter – and it references a potentially useful post by Venables that is unfortunately no longer available.
Change point analysis using R's nls()

(Note, my data are weighted by a variable that represents effort. When it is available, you can feed wt to a weights=() statement, otherwise, I use an expanded dataset or weight the sum of squares in the model.)

Simulate segmented data with diminishing sample size

set.seed(15)
a.sim<-0 #  intercept
b.sim<-0.5 #  slope for segment 1
n<-12
brk<-4 # breakpoint
x <- c(1:n)
y <- numeric(n)
y[1:brk]<-a.sim+b.sim*x[1:brk]+rnorm(brk, 0, .2)
y[(brk+1):n] <- (a.sim+ b.sim* brk) + rnorm((n-brk), 0, .2) # second, flat segment
y[n]<-y[n]-.30*y[n] #subtract from last point, as these are often outside of the normal pattern
wt<-c(rep(50, n-4), c(40, 40, 35, 5)) #weight variable 

dat<-as.data.frame(cbind(x, y, wt)) # make dataframe 
dat.expand <- dat[ rep( seq(dim(dat)[1]), dat$wt),]# second dataset with rows repeated based on weight

with(dat, plot(x,y, ylim=c(0, max(y)), pch=16, cex=wt/(10)))### plot, with symbols representing weight variable

Use optim to solve

mod<-function(par, data){
  a <- par[1]
  b <-par[2]
  x.star <-par[3]
  yfit<-c(NA,length(data$y))
  small.x<-I(data$x<=x.star)
  yfit[small.x==T]<-(a+b*data$x[small.x==T]) 
  yfit[small.x!=T]<-(a+b*x.star) 
  sum((data$y-yfit)^2)
}
fit1<-optim(par=c(a=.5, b=.5, x.star=3), mod, data=dat, hessian=T)

this almost always works well and provides a reasonable fit. For production run, I feed it a large table of potential start values, to find a reasonable start point before actually optimizing.

bblme for cis

I talked to a statistician, who recommended this as a possible method for generating confidence intervals around the breakpoint. It works for my data in some instances, but usually gives errors.

library(bbmle)
mod2<-function(a,b,x.star,sig){
yfit<-c(NA,length(y))
small.x<-I(x<=x.star)
yfit[small.x==T]<-(a+b*x[small.x==T]) 
yfit[small.x!=T]<-(a+b*x.star) 
-sum(dnorm(y,yfit,1, log=TRUE))}

fit3<-mle2(mod2, start=list(a=0,b=0.5,x.star=brk), data=dat)
ci<-confint(fit3) 

nls method

This basically never works for my kind of data (works if I simulate much larger datasets) I get either the singular gradient matrix error or
Error in …. step factor 0.000488281 reduced below 'minFactor' of 0.000976562…

nls.mod<-nls(y ~ ifelse(x <x.star, a+b*x,a+b*x.star), 
data = dat, weights=wt,
start = c(x.star=brk,b=0.4, a=0))

JAGS?

So, I also have this working in JAGS/r2jags, again with help from a statistician. I would rather not use this approach if there is a straight-forward, frequentist way to get a reasonable estimate of error around the breakpoint. I'm not looking for perfection here. Just something reasonable..

Best Answer

Hmm, so I think I figured out the issues I was having with segmented. It had to do with the weight statement (it doesn't work to weight the lm and the segmented model).

Segmented seems like the best bet for me. It does a good job estimating breakpoints, even with my short datasets. It isn't terribly difficult to constrain the second slope to 0, and it has a built-in test for the significance of the change in slope. If anyone feels like explaining the difficulty with estimating error around a breakpoint estimate, I'm all ears!

library(segmented)

with second segment not constrained to 0

out.lm <- lm(y~x, data=dat)
o<-segmented(out.lm, seg.Z= ~x, weights=wt, psi=10)
with(dat, plot(x,y, ylim=c(0, max(y)), pch=16, cex=wt/(13), main="segmented"))
lines(x=dat$x, y=fitted(o), col="blue")
lines.segmented(o, col="blue")

fix from vito to fix slope to 0

o2<-lm(y~1)
xx<- -x
o3<-segmented(o2,seg.Z=~xx, weights=wt, psi=list(xx=-4))

points(x,fitted(o3),col="green")
ci<-confint(o3, rev.sgn=TRUE)
lines.segmented(o3, col="green", rev.sgn=TRUE, lwd=2)

test for slope change

davies.test(o3,~xx)