Nobody ever reads the documentation :-/
The package vignette for feature selection had all the details. They can know be found at:
http://caret.r-forge.r-project.org/featureselection.html
in Algorithm #2.
In your case, you have inner resampling to tune the SVM at each iteration (line 2.9 if Algo #2) and an external one to evaluate the number of predictors (line 2.1).
Why does it do this? With small to moderate numbers of instances, a simple partition to a single test set does a very poor job of estimating performance and may very well over-fit to the predictors. [1] concisely summarize this point: ``hold--out samples of tolerable size [...] do not match the cross--validation itself for reliability in assessing model fit and are hard to motivate''.
I would advise reading [2], which reflects how difficult validating feature selection can be. If you have a lot of data, perhaps a single test set would be sufficient.
One other note: you don't show what svmFuncs
is exactly, so I don't know how you are estimating variable importance. If you are using the default method, it does the analysis for each predictor independently so using rerank = TRUE
is a waste of time (i.e the values will be the same at each calculation).
Max
[1] Hawkins, D. M., Basak, S. C., & Mills, D. (2003). Assessing Model Fit by Cross-Validation. Journal of Chemical Information and Modeling, 43(2), 579–586. doi:10.1021/ci025626i
[2] Ambroise, C., & McLachlan, G. (2002). Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences, 99(10), 6562–6566.
I don't think caret supports multi-task learning in any of its functions. You could try the glmnet package, with distribution set to mgaussian
. This will allow you to do feature selection via lasso regularization, ridge regularization, or elastic net regularization for a linear regression model.
There may be other R machine learning libraries with built-in feature selection that support multi-task learning. Here's some sample code for multi-task learning, using the lasso for variable selection, adapted from ?glmnet
:
#Create a dataset
set.seed(42)
library(glmnet)
x=matrix(rnorm(100*20),100,20)
cf <- sample(0:1, 20, replace=TRUE) #Select some columns
response1 <- x %*% cf*runif(20) #Apply random coefficients
response2 <- x %*% cf*runif(20)
y=cbind(response1, response2)
#Fit a single lasso model
#0 for ridge
#1 for lasso
#>0 & <1 for the elastic net (mix of ridge and lasoo)
fit1m=glmnet(x,y,family="mgaussian",alpha=1)
plot(fit1m,type.coef="2norm")
#Select lambda through cross validation
fit1m.cv <- cv.glmnet(x,y,family="mgaussian",alpha=1)
plot(fit1m.cv)
coef(fit1m.cv) #Show coefficients at the selected value of lambda
Best Answer
caret
has astepLDA
method available intrain
:This uses
stepclass
in theklaR
package. There are also LDA feature selection tools incaret
usingrfe
andsbf
that would be helpful.Max