I am trying to do some feature selection, having around 3500 variables for about 200 samples. To each sample is associated two numerical values (the expected outcome). I can't manage to make the caret work with this, or even find any information on this. Does anybody know how to do this?
As an example, my data is roughly in the following format:
Samples:
S1 2.1 1.2 3.1 ... 4.2 1.7 5.2
S2 3.4 1.1 4.5 ... 5.3 1.2 5.7
...
S3499 2.4 3.5 5.1 ... 2.2 1.5 5.7
S3500 4.1 1.2 5.4 ... 1.2 2.1 5.8
Targets:
S1 1.82 1.44
S2 2.44 1.22
...
S3499 1.23 1.32
S3500 1.99 1.51
Thanks,
Swatchpuppy
Best Answer
I don't think caret supports multi-task learning in any of its functions. You could try the glmnet package, with distribution set to
mgaussian
. This will allow you to do feature selection via lasso regularization, ridge regularization, or elastic net regularization for a linear regression model.There may be other R machine learning libraries with built-in feature selection that support multi-task learning. Here's some sample code for multi-task learning, using the lasso for variable selection, adapted from
?glmnet
: