I would like to use GLM and Elastic Net to select those relevant features + build a linear regression model (i.e., both prediction and understanding, so it would be better to be left with relatively few parameters). The output is continuous. It's $20000$ genes per $50$ cases. I've been reading about the glmnet
package, but I'm not 100% sure about the steps to follow:
-
Perform CV to choose lambda:
cv <- cv.glmnet(x,y,alpha=0.5)
(Q1) given the input data, would you choose a different alpha value?
(Q2) do I need to do something else before build the model? -
Fit the model:
model=glmnet(x,y,type.gaussian="covariance",lambda=cv$lambda.min)
(Q3) anything better than "covariance"?
(Q4) If lambda was chosen by CV, why does this step neednlambda=
?
(Q5) is it better to uselambda.min
orlambda.1se
? -
Obtain the coefficients, to see which parameters have fallen out ("."):
predict(model, type="coefficients")
In the help page there are many
predict
methods (e.g.,predict.fishnet
,predict.glmnet
,predict.lognet
, etc). But any "plain" predict as I saw on an example.
(Q6) Should I usepredict
orpredict.glmnet
or other?
Despite what I've read about regularization methods, I'm quite new in R and in these statistical packages, so it's difficult to be sure if I'm adapting my problem to the code. Any suggestions will be welcomed.
UPDATE
Based on "As previously noted, an object of class train contains an element called finalModel
, which is the fitted model with the tuning parameter values selected by resampling. This object can be used in the traditional way to generate predictions for new samples, using that model's
predict function."
Using caret
to tune both alpha and lambda:
trc = trainControl(method=cv, number=10)
fitM = train(x, y, trControl = trC, method="glmnet")
Does fitM
replace previous step 2? If so, how to specify the glmnet options (type.gaussian="naive",lambda=cv$lambda.min/1se
) now?
And the following predict
step, can I replace model
to fitM
?
If I do
trc = trainControl(method=cv, number=10)
fitM = train(x, y, trControl = trC, method="glmnet")
predict(fitM$finalModel, type="coefficients")
does it make sense at all or am I incorrectly mixing both package vocabulary?
Best Answer
Part 1
In the elastic net two types of constraints on the parameters are employed
$\alpha$ controls the relative weighting of the two types. The Lasso constraints allow for the selection/removal of variables in the model. The ridge constraints can cope with collinear variables. Which you put more weight upon will depend on the data properties; lots of correlated variables may need both constraints, a few correlated variables might suggest more emphasis on ridge constraints.
One way to solve this is to treat $\alpha$ as a tuning parameter alongside $\lambda$ and use the values that give the lowest CV error, in the same way that you are tuning over $\lambda$ at the moment with
cv.glmnet
.The R package caret can build models using the glmnet package and should be set up to tune over both parameters $\alpha$ and $\lambda$.
Part 2
Q3
Yes, in this case where $m \gg n$ (number of variables $\gg$ number of observations), the help page for
?glmnet
suggests to useInstead of storing all the inner-products computed along the way, which can be inefficient with a large number of variables or when $m \gg n$, the
"naive"
option will loop over $n$ each time it is required to computer inner products.If you had not specified this argument,
glmnet
would have chosen"naive"
anyway as $m > 500$, but it is better to specify this explicitly incase the defaults and options change later in the package and you are running code at a future date.Q4
Short answer, you don't need to specify a high value for
nlambda
now that you have chosen an optimal value, conditioned on $\alpha = 0.5$. However, if you want to plot the coefficient paths etc then having a modest set of values of $\lambda$ over the interval results in a much nicer set of paths. The computational burden of doing the entire path relative to one specific $\lambda$ is not that great, the result of a lot of effort to develop algorithms to do this job correctly. I would just leavenlambda
at the default, unless it makes an appreciable difference in compute time.Q5
This is a question about parsimony. The
lambda.min
option refers to value of $\lambda$ at the lowest CV error. The error at this value of $\lambda$ is the average of the errors over the $k$ folds and hence this estimate of the error is uncertain. Thelambda.1se
represents the value of $\lambda$ in the search that was simpler than the best model (lambda.min
), but which has error within 1 standard error of the best model. In other words, using the value oflambda.1se
as the selected value for $\lambda$ results in a model that is slightly simpler than the best model but which cannot be distinguished from the best model in terms of error given the uncertainty in the $k$-fold CV estimate of the error of the best model.The choice is yours:
lambda.min
lambda.1se
Part 3
This is a simple one and is something you'll come across a lot with R. You use the
predict()
function 99.9% of the time. R will arrange for the use of the correct function for the object supplied as the first argument.More technically,
predict
is a generic function, which has methods (versions of the function) for objects of different types (technically known as classes). The object created byglmnet
has a particular class (or classes) depending on what type of model is actually fitted. glmnet (the package) provides methods for thepredict
function for these different types of objects. R knows about these methods and will choose the appropriate one based on the class of the object supplied.