I try to utilize Generalized Lasso genlasso {genlasso}
function to incorporate additional penalty matrix into estimation process.
I started with a "hello world" example: I put a diagonal matrix as a penalty matrix in a model. I realized the predictions I got from the model are very far from what I expected:
- predictions are negative whereas my response variable is positive.
Regular Lasso from glmnet
model gives me at least reasonable (positive) predictions, so my question is: what do I miss in function call or what I do not understand that I end up with such unintuitive predictions from genlasso {genlasso}
model.
Reproducible example.
Step. 1. Simulate data based on covariance and mean estimates from my real data (correct simulation result for negative response variable values).
# Read objects describing simulation structure
sim.cov <- dget(url("https://raw.githubusercontent.com/martakarass/so_q/master/genlasso/sim_cov"))
sim.means <- dget(url("https://raw.githubusercontent.com/martakarass/so_q/master/genlasso/sim_means"))
# Simulate X (explanatory variables) and Y (response) matrices
# from multivariate normal
library(MASS)
set.seed(1)
sim.sample <- mvrnorm(n = 150, mu = sim.means, Sigma = sim.cov)
sim.sample.x <- sim.sample[, 1:66]
sim.sample.y <- sim.sample[, 67]
# Correct for negative response variable value
sim.sample.y[sim.sample.y < 0]
# [1] -0.8409024 -1.8470264 -1.8503388
sim.sample.y[sim.sample.y < 0] <- abs(sim.sample.y[sim.sample.y < 0])
summary(sim.sample.y)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.01549 7.39000 10.93000 10.72000 13.58000 24.77000
Step 2. Simulate train and test subset indices
train.idc <- as.logical(rbinom(n = nrow(sim.sample), size = 1, prob = 0.8))
test.idc <- !train.idc
Step 3. Build regular Lasso model. Predict values with the use of best lambda value from cross-validation.
# LASSO
# Cross-validate on a train set
library(glmnet)
cv.glmnet.out <- cv.glmnet(x = sim.sample.x[train.idc, ],
y = sim.sample.y[train.idc],
nfolds = 10, type.measure = "mse", alpha = 1)
lambda.cv <- cv.glmnet.out$lambda.min
model.lasso <- glmnet(sim.sample.x[train.idc, ], sim.sample.y[train.idc],
alpha = 1)
model.lasso.predicted <- predict(model.lasso, s = lambda.cv,
newx = sim.sample.x[test.idc, ])
# Investigate predictions from a model
summary(as.vector(model.lasso.predicted))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 3.412 9.235 10.850 10.960 12.760 17.950
Step 4. Build Generalized Lasso model. For convenience, use same best lambda as in regular LASSO model.
# LASSO `genlasso`: diagonal penalty matrix
p <- ncol(sim.sample.x)
library(genlasso)
genlasso.diag.out <- genlasso(y = sim.sample.y[train.idc],
X = sim.sample.x[train.idc, ],
D = diag(1, p))
model.genlasso.diag.predicted <- predict(genlasso.diag.out, lambda=lambda.cv,
Xnew = sim.sample.x[test.idc, ])$fit
-> investigate predicted values, which are negative
summary(as.vector(model.genlasso.diag.predicted))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -27.2800 -6.0070 1.2580 0.3049 6.2230 24.4700
Best Answer
There are two issues:
glmnet
includes an intercept by default whilegenlasso
does not. To disable the intercept inglmnet
, useintercept=FALSE
; it will reduce the performance significantly. To instead add an intercept ingenlasso
,cbind
a column of ones onto yourX
matrix, and changeD
accordingly. If you choose to penalize it, you might need to set the corresponding value in thep+1
byp+1
D
matrix more carefully than in the example code below. By settingpenalize.intercept
toFALSE
in the code below, it is not penalized by using ap
byp+1
penalty matrixD
which is an identity matrix with the row corresponding to the intercept cut off.glmnet
andgenlasso
are on different scales. Note thatglmnet
optimizes $\frac{1}{2N}\sum_{i=1}^N(y-\beta_0-x_i^T\beta)^2+\lambda\left[(1-\alpha)\left\|\beta\right\|_2^2+\alpha\left\|\beta\right\|_1\right]$, whilegenlasso
optimizes $\frac12\left\|y-X\beta\right\|_2^2+\lambda\left\|D\beta\right\|_1$; both use the $L^1$ norm for the lasso penalty, butglmnet
's error term is equivalent to the $L^2$ norm divided by $2N$, whilegenlasso
's formulation divides by just $2$. The $\lambda$ values from one must be rescaled to be applied to another. This is also an issue when using the CV'd $\lambda$ values for fitting a model with significantly different $N$, which is ignored in the code below. We can either scalelambda.cv
to the appropriate value forgenlasso
, or use cross-validation to select an appropriate value directly. The code below takes the latter approach. Note thatglmnet
provides two methods of selecting thelambda
value from cross-validation: the one giving the minimum CV loss (lambda.min
), and one from the 1-standard-error rule, which adds some more regularization in exchange for a bit higher CV loss (lambda.1se
); the 1se rule produces better out-of-sample performance in this case.Code for
glmnet
without intercept,genlasso
with intercept and CV-selectedlambda
: