- How to easily calculate the intercept and slope of each segment?
The slope of each segment is calculated by simply adding all the coefficients up to the current position. So the slope estimate at $x=15$ is $-1.1003 + 1.3760 = 0.2757\,$.
The intercept is a little harder, but it's a linear combination of coefficients (involving the knots).
In your example, the second line meets the first at $x=9.6$, so the red point is on the first line at $21.7057 -1.1003 \times 9.6 = 11.1428$. Since the second line passes through the point $(9.6, 11.428)$ with slope $0.2757$, its intercept is $11.1428 - 0.2757 \times 9.6 = 8.496$. Of course, you can put those steps together and it simplifies right down to the intercept for the second segment = $\beta_0 - \beta_2 k_1 = 21.7057 - 1.3760 \times 9.6$.
Can the model be reparameterized to do this in one calculation?
Well, yes, but it's probably easier in general to just compute it from the model.
2. How to calculate the standard error of each slope of each segment?
Since the estimate is a linear combination of regression coefficients $a^\top\hat\beta$, where $a$ consists of 1's and 0s, the variance is $a^\top\text{Var}(\hat\beta)a$. The standard error is the square root of that sum of variance and covariance terms.
e.g. in your example, the standard error of the slope of the second segment is:
Sb <- vcov(mod)[2:3,2:3]
sqrt(sum(Sb))
alternatively in matrix form:
Sb <- vcov(mod)
a <- matrix(c(0,1,1),nr=3)
sqrt(t(a) %*% Sb %*% a)
3. How to test whether two adjacent slopes have the same slopes (i.e. whether the breakpoint can be omitted)?
This is tested by looking at the coefficient in the table of that segment. See this line:
I(pmax(x - 9.6, 0)) 1.3760 0.2688 5.120 8.54e-05 ***
That's the change in slope at 9.6. If that change is different from 0, the two slopes aren't the same. So the p-value for a test that the second segment has the same slope as the first is right at the end of that line.
If the goal is simply to fit a function, you could treat this as an optimization problem:
y <- c(4.5,4.3,2.57,4.40,4.52,1.39,4.15,3.55,2.49,4.27,4.42,4.10,2.21,2.90,1.42,1.50,1.45,1.7,4.6,3.8,1.9)
x <- c(320,419,650,340,400,800,300,570,720,480,425,460,675,600,850,920,975,1022,450,520,780)
plot(x, y, col="black",pch=16)
#we need four parameters: the two breakpoints and the starting and ending intercepts
fun <- function(par, x) {
#set all y values to starting intercept
y1 <- x^0 * par["i1"]
#set values after second breakpoint to ending intercept
y1[x >= par["x2"]] <- par["i2"]
#which values are between breakpoints?
r <- x > par["x1"] & x < par["x2"]
#interpolate between breakpoints
y1[r] <- par["i1"] + (par["i2"] - par["i1"]) / (par["x2"] - par["x1"]) * (x[r] - par["x1"])
y1
}
#sum of squared residuals
SSR <- function(par) {
sum((y - fun(par, x))^2)
}
library(optimx)
optimx(par = c(x1 = 500, x2 = 820, i1 = 5, i2 = 1),
fn = SSR,
method = "Nelder-Mead")
# x1 x2 i1 i2 value fevals gevals niter convcode kkt1 kkt2 xtimes
#Nelder-Mead 449.8546 800.0002 4.381454 1.512305 0.6404728 373 NA NA 0 TRUE TRUE 0.06
lines(300:1100,
fun(c(x1 = 449.8546, x2 = 800.0002, i1 = 4.381454, i2 = 1.512305), 300:1100))
Best Answer
The pwlf library has an example specifying bounds for the breakpoint search, which should work for your example.
Unknown breakpoints is a tough optimization problem, I could foresee using soft constraints to modify the loss function to penalize small N breaks, hard constraints I think will be difficult. So I think restricting search space is the best bet.
Some changepoint formulations do this explicitly by marginalizing out the changepoint (so only search over a discrete set of change point locations), here is an example in pystan I have written.