Constrained Regression – Linear Regression With Slope Constraint

constrained regressionrregression

I want to perform a very simple linear regression in R. The formula is as simple as $y = ax + b$. However I would like the slope ($a$) to be inside an interval, let's say, between 1.4 and 1.6.

How can this be done?

Best Answer

I want to perform ... linear regression in R. ... I would like the slope to be inside an interval, let's say, between 1.4 and 1.6. How can this be done?

(i) Simple way:

  • fit the regression. If it's in the bounds, you're done.

  • If it's not in the bounds, set the slope to the nearest bound, and

  • estimate the intercept as the average of $(y - ax)$ over all observations.

(ii) More complex way: do least squares with box constraints on the slope; many optimizaton routines implement box constraints, e.g. nlminb (which comes with R) does.

Edit: actually (as mentioned in the example below), in vanilla R, nls can do box constraints; as shown in the example, that's really very easy to do.

You can use constrained regression more directly; I think the pcls function from the package "mgcv" and the nnls function from the package "nnls" both do.

--

Edit to answer followup question -

I was going to show you how to use it with nlminb since that comes with R, but I realized that nls already uses the same routines (the PORT routines) to implement constrained least squares, so my example below does that case.

NB: in my example below, $a$ is the intercept and $b$ is the slope (the more common convention in stats). I realized after I put it in here that you started the other way around; I'm going to leave the example 'backward' relative to your question, though.

First, set up some data with the 'true' slope inside the range:

 set.seed(seed=439812L)
 x=runif(35,10,30)
 y = 5.8 + 1.53*x + rnorm(35,s=5)  # population slope is in range
 plot(x,y)
 lm(y~x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
     12.681        1.217  

... but LS estimate is well outside it, just caused by random variation. So lets use the constrained regression in nls:

 nls(y~a+b*x,algorithm="port",
   start=c(a=0,b=1.5),lower=c(a=-Inf,b=1.4),upper=c(a=Inf,b=1.6))

Nonlinear regression model
  model: y ~ a + b * x
   data: parent.frame()
    a     b 
9.019 1.400 
 residual sum-of-squares: 706.2

Algorithm "port", convergence message: both X-convergence and relative convergence (5)

As you see, you get a slope right on the boundary. If you pass the fitted model to summary it will even produce standard errors and t-values but I am not sure how meaningful/interpretable these are.

So how does my suggestion (1) compare? (i.e. set the slope to the nearest bound and average the residuals $y-bx$ to estimate the intercept)

 b=1.4
 c(a=mean(y-x*b),b=b)
       a        b 
9.019376 1.400000

It's the same estimate ...

In the plot below, the blue line is least squares and the red line is the constrained least squares:

constrained and LS line