I am working on a speech emotion recognition problem and my training dataset consists of about $4000$ points of $138$ features each. The highest (Pearson) correlation among the features is $0.3$ and there are only $7$ features which are correlated in the range $(0.3, 0.4)$ to the target values.
Does it make sense to investigate feature selection techniques in this case ? To my understanding it is not, since the correlations between the features and between the features and the target are quite low. However, I would appreciate your thoughts in this because I do not have much experience in this field. Thank you.
Best Answer
Since you are working on a speech emotion recognition problem, I assume that's quite complex data, and I assume you are not using simple linear methods like linear regression. Please correct me if I'm wrong in this assumption.
General Notes about your problem:
That's a perfect set up to directly answer your question:
Yes, it absolutely makes sense to investigate feature selection techniques.
Reasons:
Pointers to get started on feature selection:
Again, it depends on your model, but broadly speaking, I would heavily recommend some version of Permutation Feature Importance to figure out which features are helpful. Read more here: https://scikit-learn.org/stable/modules/permutation_importance.html
There are various packages that implement it, like sklearn in Python and Boruta in R.
Quick tip for Permutation Feature Importance: In order to have a faster and more logical way of running this, try clustered Permutation Feature Importance (https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance_multicollinear.html#sphx-glr-auto-examples-inspection-plot-permutation-importance-multicollinear-py) . Essentially, group your 138 features into several groups (by which variables are most similar), and then run permutation feat. imp. on each of the entire groups, not on individual variables.
If that's a bit too complicated advanced, more simple feature selection methods include things like forward stepwise selection (add variables one at a time), backward stepwise selection (remove variables one a time) and LASSO regression (type of regression that simultaneously finds a model and removes obviously bad variables. 138 might be too many to feed right into it, though). These are all relatively straightforward to implement, and a peruse through Google should give you a good intuition/code for how to do all of them.
Side note: There are certain algorithms (like RandomForest) that often do not benefit greatly from feature selection. So, if are familiar with that technique and don't want to fuss with feature selection, that could be an option as well.
I hope this gives you good reasoning for why feature selection might be helpful even when correlation between variables and target is low, as well as some guidance about how you can get started running some feature selection on your data.