Polynomial Regression – Adding Polynomial Slope Constraint in R Linear Regression

constrained regressionpolynomialrregression

My problem is how to find the best decreasing 3rd degree polynomial regression in R.
I have data, lets say

x <- x <- c(1:5, 17, 23, 30, 35, 36)
x <- x/max(x)
y <- c(600, 555, 400, 333, 332, 331, 330, 214, 210, 190)

Here I want to find best polynomial fitting these points, with a constraint, that polynomial must be in whole interval decreasing (here the is interval from 0 to 1). When I use simple lm

m <- lm(y ~ I(x^3) +I(x^2) + x)

the polynomial is not strictly decreasing.

Simple math: I want to find a polynomial $$ y = ax^3 + bx^2 +cx + d $$

This polynomial is decreasing when

$$ dy / dx < 0 $$,

which means

$$ 3ax^2 + 2bx + c < 0 $$.

From here I can find the constraints, that are

  1. $ a < (-2bx – c)/3x^2 $,
  2. $ b < (-3x^2 – c)/2x $,
  3. $ c < -3ax^2 – 2bx $.

Is there any way how to input these constraints into lm ?

PS: I found this link with similar question, but my problem is little more complex. Hope somebody can help!

EDIT: I tested the current answer on this new data set and it doesn't work, why?

x1 <- c(0.01041667, 0.30208333, 0.61458333, 0.65625000, 0.83333333)
y1 <- c(772, 607, 576, 567, 550)

Best Answer

First of all, let's write the unconstrained model so that the coefficients are consistently ordered (in this case, from lowest to highest degree):

unconstrained_model <- lm(y ~ x + I(x^2) +I(x^3))

Secondly, constrained least square regression always has higher RMSE on the data sample than unconstrained least square, unless the latter satisfies the constraints, in which case the two solutions are the same. In practice, if the data don't respect the constraints very well, the constrained fit can be quite bad.

In your case the data are decreasing, which is good. But you also have 4 data points where the ordinate is nearly constant (with respect to the range of y, $I_y =[\min(y),\max(y)]$), even if the corresponding variation in x is considerable, with respect to $I_x = [\min(x),\max(x)]$.In other words, you have a large plateau in the center of your sample:

df <- data.frame(x,y)
library(ggplot2)
(p <- ggplot(data=df,aes(x=x,y=y)) + geom_line()+geom_point()+geom_smooth())

enter image description here

This is not a pattern which can be easily fit by a monotone decreasing third degree polynomial.

Having said that, let's solve the constrained regression problem. First of all let's get explicit equations for the constraints in terms of of the model coefficients. You already noted that the third degree polynomial is decreasing in $I=[0,1]$ if and only if

$$3ax^2+2bx+c <0 \quad \forall x \in I $$

Note that $d$ doesn't appear in this inequality, and the reason is obvious -whether our model (the third degree polynomial) is increasing or not, doesn't depend on the value of the intercept. Since $3ax^2+2bx+c$ must be negative in 0, we get one of the constraints as $c<0$. Now, to get the other constraint inequalities, we just need to make the substitutions

$$ t_1=x,\quad t_2=x^2$$

and note that

$$x\in[0,1]\Rightarrow(t_1,t_2)\in [0,1]\times [0,1]$$

We are then led to the simpler problem of imposing a negativity constraint on a linear (degree one) polynomial in two variables:

$$f(t_1,t_2)=3at_2+2bt_1+c <0 \quad \forall (t_1,t_2) \in [0,1]\times[0,1] $$

$f$ is linear and its domain, the unit square $[0,1]^2$, is convex. Then if $f$ is positive in the four corners $(0,0)$, $(0,1)$, $(1,0)$, $(1,1)$, it is positive everywhere in $[0,1]^2$. Actually, the only possible values for $(t_1,t_2)$ are those in the lower-right half of the square (the triangle with corners $(0,0)$, $(0,1)$, $(1,1)$), because

$$x \in [0,1] \Rightarrow x^2 \le x \Leftrightarrow t_2 \le t_1 $$

Thus, we only need to impose that $f$ is negative in the three corners $(0,0)$, $(0,1)$, $(1,1)$. The negativity condition in $(0,0)$ gives again $c<0$. Imposing it also in $(1,0)$ and $(1,1)$, we finally obtain the three conditions

$$c<0,\quad3a+c <0, \quad 3a+2b+c <0$$

Perfect! Now we have three linear inequality constraints which express the condition that our model is negative in $I$. To solve a linear least square problem with equality and/or inequality constraints, R offers a package with a simple and intuitive interface, limSolve.

library(limSolve)
A <- cbind(rep(1,length(x)),x,x^2,x^3)
b <- y
G <- matrix(nrow=3,ncol=4,byrow = TRUE,data = c(0, -1,-2,-3,0,-1,-2,0,0,-1,0,0))
h <- rep(0,3)
constrained_model <- lsei(A = A, B = b, G = G, H = h, type=2)

Great, now let's get predictions for both models. lm objects have a predict method, but no equivalent method is available for the results of lsei. Thus we'll use a different approach and define a my_predict function

my_predict <- function(x,coefficients){
X <- cbind(rep(1,length(x)),x,x^2,x^3)
predictions <- X%*%coefficients
}

Then

# compute predictions 
xpred <- seq(0,1,len=100)
predictions_constrained <- my_predict(xpred,constrained_model$X)
predictions_unconstrained <-    my_predict(xpred,unconstrained_model$coefficients)
df2 <- data.frame(xpred,predictions_unconstrained,predictions_constrained)

# plot results
p <- ggplot(data = df,aes(x = x, y = y,color = "data")) + 
geom_point() +
geom_line(data = df2, aes(x = xpred, y = predictions_unconstrained, color = "unconstrained fit")) +
geom_line(data = df2, aes(x = xpred, y = predictions_constrained, color = "constrained fit"))
p

enter image description here

As expected, the constrained fit is considerably worse than the unconstrained one. We can also compute the $R^2$, and get the same picture:

SS <-sum((y-mean(y))^2)
RSS_unconstrained <- sum((unconstrained_model$residuals)^2)
RSS_constrained <- sum((A%*%constrained_model$X-y)^2)
R2_constrained <- 1-RSS_constrained/SS
R2_unconstrained <- 1-RSS_unconstrained/SS

The constrained model explains only about 70% of the total variation, while the unconstrained model explains about 83% of the total variation. As a final note, the fact that the fit is worse on the sample data doesn't necessarily mean that the constrained model is worse than the unconstrained one, in terms of predictive accuracy. If, for example, the ideal model obeys the constraints, it may be that the constrained model has lower test error on unseen data.

Concerning the new set of data:

x <- c(0.01041667, 0.30208333, 0.61458333, 0.65625000, 0.83333333)
y <- c(772, 607, 576, 567, 550)

Repeating exactly the same calculations leads to a constrained model having the highest degree term equal to zero:

> constrained_model
$X
                  x                     
 759.8756 -449.1166  224.5583    0.0000 

$residualNorm
[1] 0

$solutionNorm
[1] 1854.049

$IsError
[1] FALSE

$type
[1] "lsei"

This polynomial is nonincreasing on $I=[0,1]$, as required, but it has a minimum in $x=1$, thus it would start increasing for $x>1$. As a matter of fact, the derivative is $2bx+c$ (since $a=0$), which for $x=1$ becomes

> 2*constrained_model$X[3]+constrained_model$X[2]

-6.82121e-13 

which, given the accuracy of the computations involved, is basically 0. The fact that $x=1$ is a minimum is also evident from the plot: enter image description here

This

Related Question