Machine Learning – Using Correlation Based Feature Selection (CFS) Tool

correlationfeature selectionmachine learning

Is there any tool or script that was implemented for correlation based feature selection? My feature vector data is in a large-scaled data file, so if I use tools like Weka for feature selection, I don't get any result!

Best Answer

You could use findCorrelation from R caret to achieve this, which selects the optimal subset of features to minimize their inter-feature correlation below a specified threshold. You will have to try out if this still works for your data (it will also run into problems with very huge dataset I assume):

# original correlation
library(corrplot)
corrplot(cor(mtcars), type = 'lower')

Original correlation

# finding optimal feature subset (as features to be removed)
library(caret)
toRemove <- findCorrelation(cor(mtcars), cutoff = 0.7)

# remaining correlation
corrplot(cor(mtcars[,-toRemove]), type = 'lower')

Remaining correlation

Related Question