Linear Regression – Applying Individual Constraints in R

constrained regressionrregression

I am using the nnls() function from the nnls package in R to do a linear regression for regressors $x_i$ and observations $y$.
The function delivers beta coefficients $\beta_i\geq{0}, \forall i$. However, is it possible to apply the constraints only to some regressors so that

$$\beta_k \geq 0 \quad k \in \{1…,10\}, k\neq i \\
\beta_i \in \mathbb{R} \quad i \in \{1…,10\}$$

given that I have 10 regressor variables?

The nnls package also offers the possibility to enforce some coefficients to be negative and others to be positive via the function nnnpls(). However, I only want the positive constraint for some parameters, the other ones can be either positive or negative (i.e., unconstrained).

Best Answer

Regardless of what your computing platform may be, you can trick any linearly constrained linear model solver into doing what you want without having to modify any code at all. Observe that the model

$$\mathbb{E}(Y) = X \beta$$

is equivalent to the model

$$\mathbb{E}(Y + X\gamma) = X(\beta + \gamma)$$

for any fixed coefficients $\gamma$, because the constant vector $X\gamma$ has been added to both sides. Moreover, fitting the second model will produce the same covariance matrix of estimates, etc., as the first because no change has been made in the errors whatsoever. Consequently, after estimating $\beta+\gamma$ as $\widehat{\beta+\gamma}$ using any linear procedure (not just nnls), the estimate of $\beta$ is obtained as

$$\hat\beta = \widehat{\beta+\gamma} -\gamma.$$

Exploit this flexibility to make sure that some of the coefficients of $\widehat{\beta+\gamma}$ will naturally be positive (and therefore not constrained by nnls). This can be done by estimating what those coefficients might be, performing the adjusted procedure, and checking that it has not constrained the corresponding estimates. If any have been constrained, increase the amount of adjustment and repeat until success is achieved.

I propose using an initial ordinary least squares fit to start the procedure. To be conservative, change the estimates by some small multiple $\rho$ of their standard errors. If iteration is needed, keep increasing $\rho$. The following code doubles $\rho$ at each iteration. The entire algorithm is contained within the short repeat block in the middle of the code example below.

As an example, I generated a problem with $200$ observations of seven variables (and included a constant term). The following tableau summarizes the results:

            AIntercept  AX1  AX2   AX3  AX4  AX5 AX6 AX7
True              -3.5 -2.5 -1.5 -0.50 0.50 1.50 2.5 3.5
OLS               -3.4 -2.5 -1.6 -0.52 0.44 1.49 2.4 3.5
NNLS               0.0  0.0  0.0  0.00 0.63 0.87 2.3 4.0
NNLS.0            -3.5 -2.7  0.0  0.00 0.75 1.56 2.3 3.6
Constraints        0.0  0.0  1.0  1.00 1.00 1.00 1.0 1.0
Bound              0.0  0.0  1.0  1.00 0.00 0.00 0.0 0.0

The true values of $\beta$ are listed first. The first half are negative, the second half positive. These are followed by their OLS estimates, their NNLS estimates, and the modified NNLS estimates wherein the first two coefficients (AIntercept and AX1) were not constrained to be positive. The last two lines summarize the constraints, printing "1.0" where positivity constraints were applied. The constraints actually imposed in the solution, using the same indicator method, appear last. In this case the procedure worked well.

#
# Describe a problem.
#
p <- 7                              # Number of variables
n <- 200                            # Number of observations
beta <- 0:p - p/2                   # True coefficients
constraints <- c(0, 0, rep(1, p-1)) # Positivity constraint indicator
#
# Generate data.
#
set.seed(17)
A <- cbind(Intercept=rep(1, n), 
           matrix(rnorm(p*n), n, dimnames=list(c(), paste0("X", 1:p))))
b <- A %*% beta + rnorm(n)
#
# OLS for reference.
#
fit.lm <- lm(b ~ A - 1)
#
# NNLS.
#
library(nnls)
fit.nnls <- nnls(A, b)
#
# NNLS with selective constraints.
#
rho <- 2 
coefficients <- coef(fit.lm)
se <- coef(summary(fit.lm))[, "Std. Error"]
repeat {
  beta.0 <- (coefficients - rho*se) * (1 - constraints)
  b.0 <- b - A %*% beta.0
  fit.nnls.0 <- nnls(A, b.0)
  if (all(constraints[fit.nnls.0$bound] == 1)) break #$
  if (rho > 1000) stop("Unable to find a solution.")
  rho <- rho*2
}
fit.nnls.0$x <- fit.nnls.0$x + beta.0
#
# Compare.
#
bound <- rep(0, p+1); bound[fit.nnls.0$bound] <- 1
print(rbind(True=beta, OLS=coef(fit.lm), NNLS=coef(fit.nnls),
      NNLS.0=coef(fit.nnls.0), 
      Constraints=constraints, Bound = bound), digits=2)

Related Solutions

Solved – regression with constraints

Logistic regression with box constraints served me well in the past for problems similar to yours. You are interested in prediction, not in inference, so, as long a suitable estimate of generalization error (for example, 5-fold cross-validation error) is low enough, you should be ok.

Let's consider a logit link function and a linear model:

$$\beta_0+\beta_1 x_1 +\beta_2 x_2 =\log{\frac{\mu}{1-\mu}}$$

where $\mu=\mathbb{E}[y|x_1]$

Then

$$\frac{\partial \mu}{\partial x_1}=\beta_1 \frac{\exp{(-\beta_0-\beta_1 x_1 -\beta_2 x_2)}}{(1+\exp{(-\beta_0-\beta_1 x_1 -\beta_2 x_2)})^2}>0 \iff \beta_1>0 $$

Thus constraint 1 and 2 are satisfied if you just use logistic regression with the constraint that $\beta_1>0$. In general, monotonicity constraints with respect to one or more variables are relatively easy to enforce with GLMs (Generalized Linear Models) such as logistic regression, because the monotonicity of the link function and the fact that it's expressed it as a linear function of the predictors imply that $\mu$ is always monotonic with respect to the continuous predictors.

An R package which supports logistic regression with box constraints (constraints of the type $a_i\leq\beta_i\leq b_i$) is glmnet. Its usage is a bit different from other regression functions in R, so have a look at ?glmnet. Constraint 3 wouldn't need specific attention in most cases, because most R regression functions will automatically encode categorical variables using dummy variables. Unfortunately, glmnet is one of the few functions which doesn't do that. You need to use model.matrix to solve this: if my_data holds your observations $X=\{x_{1i},x_{2i}\}_{i=1}^N$, then

M <- model.matrix(~ x1 + x2, my_data)

will build a design matrix suitable for use with glmnet.

The only limitation of this approach lies in the fact that we have modeled the logit function as a linear function of the predictors. This may prove not flexible enough for your problem: in other words, you could get a large cross-validation error. If this is so, you should look into nonparametric logistic regression - here, however, you need to fit GAMs (Generalized Additive Models), not GLMs, and imposing monotonicity becomes more complicated. The package mgcv and the function mono.con are your friends here - you'll need to read quite a lot of documentation. Gavin Simpson's answer to question

How to smooth data and force monotonicity

which you linked in your question, has a good example.

Finally, I reiterate that this approach (as well as all other approaches which rely on logistic regression, whether Bayesian or frequentist) only makes sense because you need a quick tool to approximate in an automated way multiple unknown functions inside your reinforcement learning workflow. $y|\mathbf{x}$ doesn't really have the binomial distribution, so you cannot expect to get realistic estimates of standard errors, confidence intervals, etc. If you need a real statistical model, which would give you not only point estimates but also realistic prediction intervals, then you need to take into account the real conditional distribution of your output. This question might help:

Judging the quality of a statistical model for a percentage

Solved – Fit logistic regression with linear constraints on coefficients in R

The question is not clear to me. But if you want to use formula instead of matrix input. You can use caret pacakge, which provide universal interface to many packages including glmnet. Or you can use model.matrix to convert formula to matrix form.

If you want to fit a logistic regression with $\lambda=0$, you can directly use traditional glm. If you want to manually implement it, my answer here has the code to to calculate the objective function and gradient of logistic loss and use BFGS to optimize.

For the revised question: projected gradient descent may work well, since the "box constraints" is easy to solve. My answer here gives some details on projected gradient descent. (I believe projected gradient descent is used by glmnet).

Solving constrained optimization problem: projected gradient vs. dual?

Additional reference for projected gradient descent, book from the author of glmnet: Statistical Learning with Sparsity page 120

Best Answer

Related Solutions

Solved – regression with constraints

Solved – Fit logistic regression with linear constraints on coefficients in R

Related Question