My general thoughts:
So when you are evaluating different models, you may want to tune them, try different types of pre-processing etc until you find what you think is a good model. Resampling can help guide you in the right direction during that process.
However, there is still the chance of over-fitting and the odds of this happening is greatly influenced by how much data (and predictors) you have. If you have a little bit of data, there are a few ways to think about this:
- Use all the data for training since every data point adds significantly to how well the model does.
- Set aside a small test set as a final check for gross errors due to over-fitting. The chances of over-fitting with a small samples size is not small and gets bigger with the number of samples.
I fall into the second camp but the first isn't wrong at all.
If you have a ton of data then it doesn't really matter much (unless you ave a small event rate).
For you:
You have a DOE. The type of design would help answer the question. Are you trying to interpolate between design points or predict design points that have not been tested so far?
You have one replicate. I fell like random forest is hitting a nail with a sledge hammer and might result in over-fitting. I would try something smoother like an SVM or (gasp) neural network.
Max
If you check the lambdas and your best lambda obtained from caret, you will see that it is not present in the model:
lassoFit1$bestTune$lambda
[1] 0.01545996
lassoFit1$bestTune$lambda %in% lassoFit1$finalModel$lambda
[1] FALSE
If you do:
coef(lassoFit1$finalModel,lassoFit1$bestTune$lambda)
8 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -4.532659e-15
Population 1.493984e-01
Income .
Illiteracy .
Murder -7.929823e-01
HS.Grad 2.669362e-01
Frost -1.979238e-01
Area .
It will give you the values from the lambda it tested, that is closest to your best tune lambda. You can of course re-fit the model again with your specified lambda and alpha:
fit = glmnet(x=statedata[,c(1:3,5,6,7,8)],y=statedata[,4],
lambda=lassoFit1$bestTune$lambda,alpah=lassoFit1$bestTune$alpha)
> fit$beta
7 x 1 sparse Matrix of class "dgCMatrix"
s0
Population 0.1493747
Income .
Illiteracy .
Murder -0.7929223
HS.Grad 0.2669745
Frost -0.1979134
Area .
Which you can see is close enough to the first approximation.
To answer your other questions:
I get the coefficients. Is this the best model?
You did coef(cvfit, s="lambda.min")
which is the lambda with the least error. If you read the glmnet paper, they go with Breimen's 1SE rule (see this for a complete view), as it calls uses a less complicated model. You might want to consider using coef(cvfit, s="lambda.1se")
.
does test more lambdas in the cross validation, is that true? Does
caret or glmnet lead to a better model?It looks like glmnet
by default cv.glmnet
test a defined number of lambdas, in this example it is 67 but you can specify more by passing lambda=<your set of lambda to test>
. You should get similar values using caret
or cv.glmnet
, but note that you cannot vary alpha with cv.glmnet()
How do I manage to extrage the best final model from caret and glmnet
and plug it in a cox hazard model for example?
I guess you want to take the non-zero coefficients. and you can do this by
#exclude intercept
res = coef(cvfit, s="lambda.1se")[-1,]
names(res)[which(res!=0)]
[1] "Murder" "HS.Grad"
Best Answer
Lets say your caret model is called "model". You can access the final glmnet model with
model$finalModel
. You can then callcoef(model$finalModel)
, etc. You will have to select a value of lambda for which you want coefficients, such ascoef(model$finalModel, model$bestTune$.lambda)
.Take a look at the
summaryFunction
parameter for thetrainControl
function. It will allow you to specify any function you want to minimize (or maximize, see themaximize
argument totrain
), given a predictor and a response.It might be hard to get at adjusted R^2 in this way, but you could probably get R^2 or something similar.