Python – Fit Points with Continuous Piecewise Linear Function with Minimum Points per Segment

constrained regressionpiecewise linearpython

I have a set of two-dimensional data points that I want to fit with a continuous piecewise linear function with one break point. However, I want each of the two segments to be supported by a minimum number of data points. How can I do this in Python?

With the library pwlf, I can't set a minimum number of data points per segment. Defining an interval for the break point would work just as fine, but isn't possible either. A minimum number of data points per line, on the other hand, is possible for linear-tree (which fits a decision tree with linear instead of constant leafs); but here, the result function isn't continuous. (At least I didn't find how to do it in both libraries)

Does anyone have an idea how to achieve this? I was thinking to use linear-tree to obtain a break point, which I will use then as break point in pwlf; but this doesn't seem to be the best way. Any ideas would be apprechiated!

Best Answer

The pwlf library has an example specifying bounds for the breakpoint search, which should work for your example.

Unknown breakpoints is a tough optimization problem, I could foresee using soft constraints to modify the loss function to penalize small N breaks, hard constraints I think will be difficult. So I think restricting search space is the best bet.

Some changepoint formulations do this explicitly by marginalizing out the changepoint (so only search over a discrete set of change point locations), here is an example in pystan I have written.

Related Solutions

R – Calculating Standard Error of Slopes in Piecewise Linear Regression with Known Breakpoints

How to easily calculate the intercept and slope of each segment?

The slope of each segment is calculated by simply adding all the coefficients up to the current position. So the slope estimate at $x=15$ is $-1.1003 + 1.3760 = 0.2757\,$.

The intercept is a little harder, but it's a linear combination of coefficients (involving the knots).

In your example, the second line meets the first at $x=9.6$, so the red point is on the first line at $21.7057 -1.1003 \times 9.6 = 11.1428$. Since the second line passes through the point $(9.6, 11.428)$ with slope $0.2757$, its intercept is $11.1428 - 0.2757 \times 9.6 = 8.496$. Of course, you can put those steps together and it simplifies right down to the intercept for the second segment = $\beta_0 - \beta_2 k_1 = 21.7057 - 1.3760 \times 9.6$.

Can the model be reparameterized to do this in one calculation?

Well, yes, but it's probably easier in general to just compute it from the model.

2. How to calculate the standard error of each slope of each segment?

Since the estimate is a linear combination of regression coefficients $a^\top\hat\beta$, where $a$ consists of 1's and 0s, the variance is $a^\top\text{Var}(\hat\beta)a$. The standard error is the square root of that sum of variance and covariance terms.

e.g. in your example, the standard error of the slope of the second segment is:

Sb <- vcov(mod)[2:3,2:3]
sqrt(sum(Sb))

alternatively in matrix form:

Sb <- vcov(mod)
a <- matrix(c(0,1,1),nr=3)
sqrt(t(a) %*% Sb %*% a)

3. How to test whether two adjacent slopes have the same slopes (i.e. whether the breakpoint can be omitted)?

This is tested by looking at the coefficient in the table of that segment. See this line:

I(pmax(x - 9.6, 0))   1.3760     0.2688   5.120 8.54e-05 ***

That's the change in slope at 9.6. If that change is different from 0, the two slopes aren't the same. So the p-value for a test that the second segment has the same slope as the first is right at the end of that line.

Solved – Piecewise regression with constraints

If the goal is simply to fit a function, you could treat this as an optimization problem:

y <- c(4.5,4.3,2.57,4.40,4.52,1.39,4.15,3.55,2.49,4.27,4.42,4.10,2.21,2.90,1.42,1.50,1.45,1.7,4.6,3.8,1.9)  
x <- c(320,419,650,340,400,800,300,570,720,480,425,460,675,600,850,920,975,1022,450,520,780)  
plot(x, y, col="black",pch=16)


#we need four parameters: the two breakpoints and the starting and ending intercepts
fun <- function(par, x) {
  #set all y values to starting intercept
  y1 <- x^0 * par["i1"]
  #set values after second breakpoint to ending intercept
  y1[x >= par["x2"]] <- par["i2"]
  #which values are between breakpoints?
  r <- x > par["x1"] & x < par["x2"]
  #interpolate between breakpoints
  y1[r] <- par["i1"] + (par["i2"] - par["i1"]) / (par["x2"] - par["x1"]) * (x[r] - par["x1"])
  y1
}

#sum of squared residuals
SSR <- function(par) {
     sum((y - fun(par, x))^2)
   }


library(optimx)
optimx(par = c(x1 = 500, x2 = 820, i1 = 5, i2 = 1), 
       fn = SSR, 
       method = "Nelder-Mead")

#                  x1       x2       i1       i2     value fevals gevals niter convcode kkt1 kkt2 xtimes
#Nelder-Mead 449.8546 800.0002 4.381454 1.512305 0.6404728    373     NA    NA        0 TRUE TRUE   0.06

lines(300:1100, 
      fun(c(x1 = 449.8546, x2 = 800.0002, i1 = 4.381454, i2 = 1.512305), 300:1100))

resulting plot

Best Answer

Related Solutions

R – Calculating Standard Error of Slopes in Piecewise Linear Regression with Known Breakpoints

Solved – Piecewise regression with constraints

Related Question