Solved – Weka java API: Attribute Selection and Cross Validation

classificationfeature selectionjavaweka

Is there a way to perform Attirbute selection(aka feature selection) (regardless of method) only for the training dataset before passing data for Cross Validation ?

I currently think that the only possbile way to perform this using the Weka API is through a meta>>AttributeSelectedClassifier. However I am not yet sure whether this method Performs first Attributes Selection in the whole dataset (without taking into account the crossvalidation folds) and then classification, thus possibly introducing bias into the cross validation evaluation result.

Any Ideas???

Best Answer

Suppose that you want to evaluate a {feature selector+classifier} metaclassifier using 5 CV.

As far as I know, the meta>>AttributeSelectedClassifier is treated like any other classifier. That is, it is trained on 4/5 of data and tested on 1/5 of the data. This means that the feature selector runs on training data identifying the best features. Then, the reduced feature set is fed to the classifier and an actual schema gets generated. When testing the meta, the feature selection part of the metaclassifier just selects only those features previously determined as good. The result is fed to the learned schema and a prediction will be generated.

So, afaik, using AttributeSelectedClassifier is the right way to evaluate your schema.

Word of advice, I would throw in here, the data normalization/standardization, missing value inference, and/or any other meta parameter search for the actual classifier. So you will end up having an actual classifier wrapped up in several "metaclassifiers"

Related Question