Solved – Why I am that unsuccessful with predicting with Generalized Lasso (genlasso {genlasso})

lassopredictionrregularization

I try to utilize Generalized Lasso genlasso {genlasso} function to incorporate additional penalty matrix into estimation process.

I started with a "hello world" example: I put a diagonal matrix as a penalty matrix in a model. I realized the predictions I got from the model are very far from what I expected:

  • predictions are negative whereas my response variable is positive.

Regular Lasso from glmnet model gives me at least reasonable (positive) predictions, so my question is: what do I miss in function call or what I do not understand that I end up with such unintuitive predictions from genlasso {genlasso} model.

Reproducible example.

Step. 1. Simulate data based on covariance and mean estimates from my real data (correct simulation result for negative response variable values).

# Read objects describing simulation structure 
sim.cov   <- dget(url("https://raw.githubusercontent.com/martakarass/so_q/master/genlasso/sim_cov"))
sim.means <- dget(url("https://raw.githubusercontent.com/martakarass/so_q/master/genlasso/sim_means"))

# Simulate X (explanatory variables) and Y (response) matrices 
# from multivariate normal
library(MASS)
set.seed(1)
sim.sample <- mvrnorm(n = 150, mu = sim.means, Sigma = sim.cov)
sim.sample.x <- sim.sample[, 1:66]
sim.sample.y <- sim.sample[, 67]

# Correct for negative response variable value 
sim.sample.y[sim.sample.y < 0]
# [1] -0.8409024 -1.8470264 -1.8503388
sim.sample.y[sim.sample.y < 0] <- abs(sim.sample.y[sim.sample.y < 0])
summary(sim.sample.y)
# Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
# 0.01549  7.39000 10.93000 10.72000 13.58000 24.77000  

Step 2. Simulate train and test subset indices

train.idc <- as.logical(rbinom(n = nrow(sim.sample), size = 1, prob = 0.8))
test.idc <- !train.idc

Step 3. Build regular Lasso model. Predict values with the use of best lambda value from cross-validation.

# LASSO
# Cross-validate on a train set 
library(glmnet)
cv.glmnet.out <- cv.glmnet(x = sim.sample.x[train.idc, ], 
                           y = sim.sample.y[train.idc], 
                           nfolds = 10, type.measure = "mse", alpha = 1)
lambda.cv <- cv.glmnet.out$lambda.min
model.lasso <- glmnet(sim.sample.x[train.idc, ], sim.sample.y[train.idc], 
                      alpha = 1)
model.lasso.predicted <- predict(model.lasso, s = lambda.cv, 
                                 newx = sim.sample.x[test.idc, ])

# Investigate predictions from a model
summary(as.vector(model.lasso.predicted))
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   3.412   9.235  10.850  10.960  12.760  17.950

Step 4. Build Generalized Lasso model. For convenience, use same best lambda as in regular LASSO model.

# LASSO `genlasso`: diagonal penalty matrix 
p <- ncol(sim.sample.x)
library(genlasso)
genlasso.diag.out <- genlasso(y = sim.sample.y[train.idc], 
                              X = sim.sample.x[train.idc, ], 
                              D = diag(1, p))
model.genlasso.diag.predicted <- predict(genlasso.diag.out, lambda=lambda.cv, 
                                         Xnew = sim.sample.x[test.idc, ])$fit

-> investigate predicted values, which are negative

summary(as.vector(model.genlasso.diag.predicted))
#     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
# -27.2800  -6.0070   1.2580   0.3049   6.2230  24.4700 

Best Answer

There are two issues:

  1. glmnet includes an intercept by default while genlasso does not. To disable the intercept in glmnet, use intercept=FALSE; it will reduce the performance significantly. To instead add an intercept in genlasso, cbind a column of ones onto your X matrix, and change D accordingly. If you choose to penalize it, you might need to set the corresponding value in the p+1 by p+1 D matrix more carefully than in the example code below. By setting penalize.intercept to FALSE in the code below, it is not penalized by using a p by p+1 penalty matrix D which is an identity matrix with the row corresponding to the intercept cut off.
  2. the lambda values for glmnet and genlasso are on different scales. Note that glmnet optimizes $\frac{1}{2N}\sum_{i=1}^N(y-\beta_0-x_i^T\beta)^2+\lambda\left[(1-\alpha)\left\|\beta\right\|_2^2+\alpha\left\|\beta\right\|_1\right]$, while genlasso optimizes $\frac12\left\|y-X\beta\right\|_2^2+\lambda\left\|D\beta\right\|_1$; both use the $L^1$ norm for the lasso penalty, but glmnet's error term is equivalent to the $L^2$ norm divided by $2N$, while genlasso's formulation divides by just $2$. The $\lambda$ values from one must be rescaled to be applied to another. This is also an issue when using the CV'd $\lambda$ values for fitting a model with significantly different $N$, which is ignored in the code below. We can either scale lambda.cv to the appropriate value for genlasso, or use cross-validation to select an appropriate value directly. The code below takes the latter approach. Note that glmnet provides two methods of selecting the lambda value from cross-validation: the one giving the minimum CV loss (lambda.min), and one from the 1-standard-error rule, which adds some more regularization in exchange for a bit higher CV loss (lambda.1se); the 1se rule produces better out-of-sample performance in this case.

Code for glmnet without intercept, genlasso with intercept and CV-selected lambda:

set.seed(42) # set RNG seed for reproducibility 
n.train <- sum(train.idc) # number of training points
n.folds <- 10L # number of CV folds
foldid <- sample(rep_len(seq.int(n.folds), n.train)) # fold number for each training point
## (creates folds with roughly the same number of points)

train.x <- sim.sample.x[train.idc, ]
train.y <- sim.sample.y[train.idc]
test.x <- sim.sample.x[test.idc, ]
test.y <- sim.sample.y[test.idc]

## Call cv.glmnet specifying the foldid's:
cv.glmnet.fit <- cv.glmnet(x = train.x, 
                           y = train.y, 
                           foldid = foldid,
                           type.measure = "mse", alpha = 1)
## Choose the lambda with the minimum CV-estimated loss:
cv.glmnet.lambda.min.pred <- predict(cv.glmnet.fit,
                                     s = cv.glmnet.fit$lambda.min, #$
                                     newx = test.x)
## Using the 1-standard-error lambda selection rule:
cv.glmnet.lambda.1se.pred <- predict(cv.glmnet.fit,
                                     s = cv.glmnet.fit$lambda.1se, #$
                                     newx = test.x)
## with no intercept:
cv.glmnet0.fit <- cv.glmnet(x = train.x, 
                            y = train.y, 
                            foldid = foldid,
                            type.measure = "mse", alpha = 1,
                            intercept=FALSE)
cv.glmnet0.lambda.min.pred <- predict(cv.glmnet0.fit,
                                      s = cv.glmnet0.fit$lambda.min, #$ 
                                      newx = test.x)
cv.glmnet0.lambda.1se.pred <- predict(cv.glmnet0.fit,
                                      s = cv.glmnet0.fit$lambda.1se, #$
                                      newx = test.x)


## Get lambda sequence for genlasso on all training data:
D <- diag(1, p+1)
penalize.intercept <- FALSE
if (!penalize.intercept) D <- D[-1,]
genlasso.fit <- genlasso(y = train.y, 
                         X = cbind(1,train.x), 
                         D = D)
## Evaluate each lambda on each fold:
fold.lambda.losses <- tapply(seq_along(foldid), foldid, function(fold.indices) {
  fold.genlasso.fit <- genlasso(y = train.y[-fold.indices],
                                X = cbind(1,train.x[-fold.indices,]),
                                D = D)
  ## length(fold.indices)-by-length(cv.genlasso.fit$lambda) matrix, with
      ## predictions for this fold:
      ## $
  fold.genlasso.preds <- predict(fold.genlasso.fit,
                                 lambda = genlasso.fit$lambda, #$
                                 Xnew = cbind(1, train.x[fold.indices,]))$fit #$
  lambda.losses <- colMeans((fold.genlasso.preds - train.y[fold.indices])^2)
  return (lambda.losses)
})
## CV loss for each lambda:
cv.lambda.losses <- colMeans(do.call(rbind, fold.lambda.losses))
cv.genlasso.lambda.min <- genlasso.fit$lambda[which.min(cv.lambda.losses)] #$
## Caution, this lambda may need rescaling based on the ratio of the full training set size to the fold training set sizes.

## Predict:
cv.genlasso.lambda.min.pred <- predict(genlasso.fit,
                                       lambda = cv.genlasso.lambda.min,
                                       Xnew = cbind(1,test.x))$fit #$

summary(as.vector(cv.glmnet.lambda.min.pred))
summary(as.vector(cv.glmnet.lambda.1se.pred))
summary(as.vector(cv.glmnet0.lambda.min.pred))
summary(as.vector(cv.glmnet0.lambda.1se.pred))
summary(as.vector(cv.genlasso.lambda.min.pred))

mean((cv.glmnet.lambda.min.pred-test.y)^2)
mean((cv.glmnet.lambda.1se.pred-test.y)^2)
mean((cv.glmnet0.lambda.min.pred-test.y)^2)
mean((cv.glmnet0.lambda.1se.pred-test.y)^2)
mean((cv.genlasso.lambda.min.pred-test.y)^2)