Lasso Regression – Resolving Null Returns in GAMLasso from PLSMSelect Package

generalized-additive-modellasso

I was fitting a generalized additive model (GAM) with LASSO penalty to my data using gamlasso, as in the following (here I use simulated data, but the idea is similar):

set.seed(1)
Sex <- sample(c(rep(1, 40), rep(0, 60)))
set.seed(2)
Group <- sample(c(rep(1, 30), rep(0, 70)))
set.seed(3)
Ed <- sample(c(rep(1, 20), rep(0, 80)))
Age <- sample(c(rep(7, 15), rep(8, 20), rep(9, 30), rep(10, 10), rep(11, 15), rep(12, 10)))

sample_data <-data.frame(Age = Age,
                         Sex = Sex,
                         Ed = Ed,
                         Group = Group,
                         Perf1 = runif(100, -200, 100),
                         Perf2 = runif(100, 20, 200),
                         Perf3 = runif(100, 0.03, 0.3)
                         )

sample_data$x <- model.matrix( ~ Age + Sex + Ed,
                        data=sample_data)[,-1]

gamlasso_fit <- gamlasso(Group ~ x + s(Perf1) + s(Perf2) + s(Perf3),
                         data = sample_data, 
                         smooth.penalty = "l1", 
                         linear.penalty = "l1", 
                         family = "binomial")

gamlasso_fit$cv.glmnet
summary(gamlasso_fit, s = lambda.1se)

Which gives me:

> gamlasso_fit$cv.glmnet
NULL
> summary(gamlasso_fit, s = lambda.1se)
$lasso
NULL

$gam

Family: binomial 
Link function: logit 

Formula:
Group ~ x + s(Perf1) + s(Perf2) + s(Perf3)

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept) -1.88730    1.63831  -1.152   0.2493  
xAge         0.09693    0.17289   0.561   0.5750  
xSex         0.71345    0.53603   1.331   0.1832  
xEd         -1.71468    0.88280  -1.942   0.0521 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
           edf Ref.df Chi.sq p-value
s(Perf1) 4.381  5.363  4.858   0.494
s(Perf2) 1.000  1.000  0.009   0.926
s(Perf3) 4.734  5.740  5.770   0.395

R-sq.(adj) =   0.12   Deviance explained = 19.4%
UBRE = 0.26659  Scale est. = 1         n = 100

My actual model is giving me the same kind of result with the lasso part ($lasso) missing in the output (see summary and gamlasso_fit$cv.glmnet), which should be part of the output due to my model specification. My question is whether I am specifying anything wrong here. I followed the vignette of the plsmselect package and their examples, however I think I might be missing something here. I am using a GAM (binomial) with LASSO penalty since non-linearity is found between the log-odds of the outcome (Group) and some of the predictors in a glm (with/without LASSO penalty).

Best Answer

You should change all instances of x (lowercase) to X (uppercase). So your code will be

set.seed(1)
Sex <- sample(c(rep(1, 40), rep(0, 60)))
set.seed(2)
Group <- sample(c(rep(1, 30), rep(0, 70)))
set.seed(3)
Ed <- sample(c(rep(1, 20), rep(0, 80)))
Age <- sample(c(rep(7, 15), rep(8, 20), rep(9, 30), rep(10, 10), rep(11, 15), rep(12, 10)))

sample_data <-data.frame(Age = Age,
                         Sex = Sex,
                         Ed = Ed,
                         Group = Group,
                         Perf1 = runif(100, -200, 100),
                         Perf2 = runif(100, 20, 200),
                         Perf3 = runif(100, 0.03, 0.3)
                         )

sample_data$X <- model.matrix( ~ Age + Sex + Ed,
                        data=sample_data)[,-1]

gamlasso_fit <- gamlasso(Group ~ X + s(Perf1) + s(Perf2) + s(Perf3),
                         data = sample_data, 
                         smooth.penalty = "l1", 
                         linear.penalty = "l1", 
                         family = "binomial")

is.null(gamlasso_fit$cv.glmnet) # FALSE

It isn't explicitly mentioned in the documentation to use an uppercase X but the package is actually hardcoded like that. In the next version of the package we will try to make it more flexible and add clearer documentation.

Related Question