Solved – How to fit piecewise constant (or step-function) model and compare to logistic model in R

aiclogisticmodelnlsregression

I have x, y data where x is position (along a transect) and y is a continuous variable (e.g. temperature).

x<-c(115,116,117,118,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151)
y<-c(68.3864344929207,69.3317985238957,70.5644430507252,68.7245413735345,68.2444929220624,69.2335442981487,68.9683874048675,69.7104166614154,70.6047681836632,71.1807115091657,70.3333333333333,70,69.735105818315,69.6487585375593,70,69.4943374527374,69,70.3590104484011,70.283981899468,68.9087146344906,68.6666666666667,68.5232663969658,68.088410889283,67.567883666713,66.9865494087022,66,66.3333333333333,66.0165138638852,65.6666666666667,65.7465975732667,66.3333333333333,66.6579079386334)

I want to fit a linear model, a 4 parameter logistic model and a piecewise constant function (aka step-function) with one breakpoint to these data and compare models to determine whether y changes gradually, more abruptly than linear or abruptly in a step-like fashion.

(NOTE: plotting the data reveals that the logistic model is most appropriate in this case but I will ultimately have to loop across many different dependent variables without this information ahead of time).

I am using nls() with manual formula (y~int+slope) for the linear model and SSfpl for the 4-parameter logistic.

l<-nls(y~int+x*slope,data=data.frame(cbind(x,y)),start=list(int=0,slope=1)) # linear model 
fl<-nls(y~SSfpl(x,A,B,xmid,scal),data=data.frame(cbind(x,y))) # four parameter logistic

My problem is fitting a piecewise constant model. There seem to existing functions for a piecewise linear model, but not many for a step-function. I found:

library(cumSeg)
pc<-jumpoints(y,k=1,output="1")

which gives me the index of an optimal breakpoint in x:

breakpt<-pc$psi

But I'm not sure how to compare the output otherwise to the linear or logistic models. For instance, I think I'll have to do model comparison using AIC but not sure how to calculate this based on the class of the object returned by jumpoints(). So I have manually tried to generate the model using the optimal breakpoint:

pcm<-lm(y~I(x<=x[breakpt])+I(x>x[breakpt]))

Now I can get AIC scores for all the models. But I've noted that when I plot the fit of my manual step-function (pcm) to that generated in cumSeg (pc), it looks like the fit of my manual model isn't as good…

par(mfrow=c(1,2))
plot(x,y)
lines(x,predict(cpm) # versus
plot(pc)

Wondering why this is happening?

And also looking for alternative suggestions for addressing the overall problem. On the model fitting side of things: the main problem is how do I tell whether or not a variable is stepped versus otherwise changing more or less linearly? (I can have a logistic model that fits better than a linear model but has similar slopes at xmid…so that's not really an answer).

On the model comparison end of things: I'm using AICc() function from the MuMIn package to get the AIC values for the different models and the akaike.weights() function from the qpcR package to do the delta AIC "test"….but maybe there are other ideas?

Thanks! Much appreciated!

Best Answer

I think one of the main difficulties is that piecewise constant regressions are not usually called "piecewise constant regressions". They are usually called regression trees, which is a nice visual name, but not particularly googleable if you don't already know what people call them! They can be fit in R with the builtin rpart package (I believe rpart stands for "recursive partitioning", which is our third name for the same concept).

Here's rpart in action on your data:

df <- data.frame(x=x, y=y)
tree <- rpart(y ~ x, data=df)

I wrote a little plot_tree function that shows the predictions

plot_tree <- function(tree, x, y) {
  s <- seq(110, 155, by=.5)
  plot(x, y)
  lines(s, predict(tree, data.frame(x=s)))
}

which, when applied to the default tree on your data, looks like this

plot_tree(tree, x, y)

enter image description here

You can control the granularity of fit to your data by using rpart.controll

tree <- rpart(y ~ x, data=df, control=rpart.control(minsplit=5, cp=.0001))
plot_tree(tree, x, y)

enter image description here

Related Question