Python – Fit Points with Continuous Piecewise Linear Function with Minimum Points per Segment

constrained regressionpiecewise linearpython

I have a set of two-dimensional data points that I want to fit with a continuous piecewise linear function with one break point. However, I want each of the two segments to be supported by a minimum number of data points. How can I do this in Python?

With the library pwlf, I can't set a minimum number of data points per segment. Defining an interval for the break point would work just as fine, but isn't possible either. A minimum number of data points per line, on the other hand, is possible for linear-tree (which fits a decision tree with linear instead of constant leafs); but here, the result function isn't continuous. (At least I didn't find how to do it in both libraries)

Does anyone have an idea how to achieve this? I was thinking to use linear-tree to obtain a break point, which I will use then as break point in pwlf; but this doesn't seem to be the best way. Any ideas would be apprechiated!

Best Answer

The pwlf library has an example specifying bounds for the breakpoint search, which should work for your example.

Unknown breakpoints is a tough optimization problem, I could foresee using soft constraints to modify the loss function to penalize small N breaks, hard constraints I think will be difficult. So I think restricting search space is the best bet.

Some changepoint formulations do this explicitly by marginalizing out the changepoint (so only search over a discrete set of change point locations), here is an example in pystan I have written.

Related Question