Solved – Why I am that unsuccessful with predicting with Generalized Lasso (genlasso {genlasso})

lassopredictionrregularization

I try to utilize Generalized Lasso genlasso {genlasso} function to incorporate additional penalty matrix into estimation process.

I started with a "hello world" example: I put a diagonal matrix as a penalty matrix in a model. I realized the predictions I got from the model are very far from what I expected:

• predictions are negative whereas my response variable is positive.

Regular Lasso from glmnet model gives me at least reasonable (positive) predictions, so my question is: what do I miss in function call or what I do not understand that I end up with such unintuitive predictions from genlasso {genlasso} model.

Reproducible example.

Step. 1. Simulate data based on covariance and mean estimates from my real data (correct simulation result for negative response variable values).

# Read objects describing simulation structure
sim.cov   <- dget(url("https://raw.githubusercontent.com/martakarass/so_q/master/genlasso/sim_cov"))
sim.means <- dget(url("https://raw.githubusercontent.com/martakarass/so_q/master/genlasso/sim_means"))

# Simulate X (explanatory variables) and Y (response) matrices
# from multivariate normal
library(MASS)
set.seed(1)
sim.sample <- mvrnorm(n = 150, mu = sim.means, Sigma = sim.cov)
sim.sample.x <- sim.sample[, 1:66]
sim.sample.y <- sim.sample[, 67]

# Correct for negative response variable value
sim.sample.y[sim.sample.y < 0]
# [1] -0.8409024 -1.8470264 -1.8503388
sim.sample.y[sim.sample.y < 0] <- abs(sim.sample.y[sim.sample.y < 0])
summary(sim.sample.y)
# Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
# 0.01549  7.39000 10.93000 10.72000 13.58000 24.77000


Step 2. Simulate train and test subset indices

train.idc <- as.logical(rbinom(n = nrow(sim.sample), size = 1, prob = 0.8))
test.idc <- !train.idc


Step 3. Build regular Lasso model. Predict values with the use of best lambda value from cross-validation.

# LASSO
# Cross-validate on a train set
library(glmnet)
cv.glmnet.out <- cv.glmnet(x = sim.sample.x[train.idc, ],
y = sim.sample.y[train.idc],
nfolds = 10, type.measure = "mse", alpha = 1)
lambda.cv <- cv.glmnet.out$lambda.min model.lasso <- glmnet(sim.sample.x[train.idc, ], sim.sample.y[train.idc], alpha = 1) model.lasso.predicted <- predict(model.lasso, s = lambda.cv, newx = sim.sample.x[test.idc, ]) # Investigate predictions from a model summary(as.vector(model.lasso.predicted)) # Min. 1st Qu. Median Mean 3rd Qu. Max. # 3.412 9.235 10.850 10.960 12.760 17.950  Step 4. Build Generalized Lasso model. For convenience, use same best lambda as in regular LASSO model. # LASSO genlasso: diagonal penalty matrix p <- ncol(sim.sample.x) library(genlasso) genlasso.diag.out <- genlasso(y = sim.sample.y[train.idc], X = sim.sample.x[train.idc, ], D = diag(1, p)) model.genlasso.diag.predicted <- predict(genlasso.diag.out, lambda=lambda.cv, Xnew = sim.sample.x[test.idc, ])$fit


-> investigate predicted values, which are negative

summary(as.vector(model.genlasso.diag.predicted))
#     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
# -27.2800  -6.0070   1.2580   0.3049   6.2230  24.4700


There are two issues:

1. glmnet includes an intercept by default while genlasso does not. To disable the intercept in glmnet, use intercept=FALSE; it will reduce the performance significantly. To instead add an intercept in genlasso, cbind a column of ones onto your X matrix, and change D accordingly. If you choose to penalize it, you might need to set the corresponding value in the p+1 by p+1 D matrix more carefully than in the example code below. By setting penalize.intercept to FALSE in the code below, it is not penalized by using a p by p+1 penalty matrix D which is an identity matrix with the row corresponding to the intercept cut off.
2. the lambda values for glmnet and genlasso are on different scales. Note that glmnet optimizes $\frac{1}{2N}\sum_{i=1}^N(y-\beta_0-x_i^T\beta)^2+\lambda\left[(1-\alpha)\left\|\beta\right\|_2^2+\alpha\left\|\beta\right\|_1\right]$, while genlasso optimizes $\frac12\left\|y-X\beta\right\|_2^2+\lambda\left\|D\beta\right\|_1$; both use the $L^1$ norm for the lasso penalty, but glmnet's error term is equivalent to the $L^2$ norm divided by $2N$, while genlasso's formulation divides by just $2$. The $\lambda$ values from one must be rescaled to be applied to another. This is also an issue when using the CV'd $\lambda$ values for fitting a model with significantly different $N$, which is ignored in the code below. We can either scale lambda.cv to the appropriate value for genlasso, or use cross-validation to select an appropriate value directly. The code below takes the latter approach. Note that glmnet provides two methods of selecting the lambda value from cross-validation: the one giving the minimum CV loss (lambda.min), and one from the 1-standard-error rule, which adds some more regularization in exchange for a bit higher CV loss (lambda.1se); the 1se rule produces better out-of-sample performance in this case.

Code for glmnet without intercept, genlasso with intercept and CV-selected lambda:

set.seed(42) # set RNG seed for reproducibility
n.train <- sum(train.idc) # number of training points
n.folds <- 10L # number of CV folds
foldid <- sample(rep_len(seq.int(n.folds), n.train)) # fold number for each training point
## (creates folds with roughly the same number of points)

train.x <- sim.sample.x[train.idc, ]
train.y <- sim.sample.y[train.idc]
test.x <- sim.sample.x[test.idc, ]
test.y <- sim.sample.y[test.idc]

## Call cv.glmnet specifying the foldid's:
cv.glmnet.fit <- cv.glmnet(x = train.x,
y = train.y,
foldid = foldid,
type.measure = "mse", alpha = 1)
## Choose the lambda with the minimum CV-estimated loss:
cv.glmnet.lambda.min.pred <- predict(cv.glmnet.fit,
s = cv.glmnet.fit$lambda.min, #$
newx = test.x)
## Using the 1-standard-error lambda selection rule:
cv.glmnet.lambda.1se.pred <- predict(cv.glmnet.fit,
s = cv.glmnet.fit$lambda.1se, #$
newx = test.x)
## with no intercept:
cv.glmnet0.fit <- cv.glmnet(x = train.x,
y = train.y,
foldid = foldid,
type.measure = "mse", alpha = 1,
intercept=FALSE)
cv.glmnet0.lambda.min.pred <- predict(cv.glmnet0.fit,
s = cv.glmnet0.fit$lambda.min, #$
newx = test.x)
cv.glmnet0.lambda.1se.pred <- predict(cv.glmnet0.fit,
s = cv.glmnet0.fit$lambda.1se, #$
newx = test.x)

## Get lambda sequence for genlasso on all training data:
D <- diag(1, p+1)
penalize.intercept <- FALSE
if (!penalize.intercept) D <- D[-1,]
genlasso.fit <- genlasso(y = train.y,
X = cbind(1,train.x),
D = D)
## Evaluate each lambda on each fold:
fold.lambda.losses <- tapply(seq_along(foldid), foldid, function(fold.indices) {
fold.genlasso.fit <- genlasso(y = train.y[-fold.indices],
X = cbind(1,train.x[-fold.indices,]),
D = D)
## length(fold.indices)-by-length(cv.genlasso.fit$lambda) matrix, with ## predictions for this fold: ##$
fold.genlasso.preds <- predict(fold.genlasso.fit,
lambda = genlasso.fit$lambda, #$
Xnew = cbind(1, train.x[fold.indices,]))$fit #$
lambda.losses <- colMeans((fold.genlasso.preds - train.y[fold.indices])^2)
return (lambda.losses)
})
## CV loss for each lambda:
cv.lambda.losses <- colMeans(do.call(rbind, fold.lambda.losses))
cv.genlasso.lambda.min <- genlasso.fit$lambda[which.min(cv.lambda.losses)] #$
## Caution, this lambda may need rescaling based on the ratio of the full training set size to the fold training set sizes.

## Predict:
cv.genlasso.lambda.min.pred <- predict(genlasso.fit,
lambda = cv.genlasso.lambda.min,
Xnew = cbind(1,test.x))$fit #$

summary(as.vector(cv.glmnet.lambda.min.pred))
summary(as.vector(cv.glmnet.lambda.1se.pred))
summary(as.vector(cv.glmnet0.lambda.min.pred))
summary(as.vector(cv.glmnet0.lambda.1se.pred))
summary(as.vector(cv.genlasso.lambda.min.pred))

mean((cv.glmnet.lambda.min.pred-test.y)^2)
mean((cv.glmnet.lambda.1se.pred-test.y)^2)
mean((cv.glmnet0.lambda.min.pred-test.y)^2)
mean((cv.glmnet0.lambda.1se.pred-test.y)^2)
mean((cv.genlasso.lambda.min.pred-test.y)^2)