Since you are working on a speech emotion recognition problem, I assume that's quite complex data, and I assume you are not using simple linear methods like linear regression. Please correct me if I'm wrong in this assumption.
General Notes about your problem:
- Don't forget Pearson correlation is only taking into account linear correlation between variables. There might be non-linear (polynomial, logarithmic etc.) relationships between your variables. There could also be step-function-like relationships. All of these things would be poorly captured by Pearson correlation.
- Since your Pearson correlations are low, it seems that the relationships in the dataset (if any) very well might be non-linear and complex (especially given the subject of the dataset). If there are complex or non-obvious relationships to be discovered, then it might be the case that 4,000 data points isn't enough. It's a good amount, and it might be enough (depending on your model), but just keep in mind that it certainly isn't a huge amount of data by any stretch, especially given how many features you have. Think about it this way - your model will have to try to identify relationships between all 138 features vs. target variable, and it's only given 4,000 data points to do so. It might not be able to capture everything there is to capture, so it might make sense to whittle down your feature set.
That's a perfect set up to directly answer your question:
Yes, it absolutely makes sense to investigate feature selection techniques.
Reasons:
- Just because the Pearson correlation is low doesn't necessarily mean there's no relationship. Feature selection methods might help you quickly figure out whether there is any more complex relationship to be discovered.
- For the reason mentioned in the second note I wrote above, if there are unhelpful variables within your 138, depending on your choice of model, it might be very helpful to get rid of them so your model can focus on analyzing the relationships between the actually useful variables vs. target variable.
Pointers to get started on feature selection:
Again, it depends on your model, but broadly speaking, I would heavily recommend some version of Permutation Feature Importance to figure out which features are helpful. Read more here: https://scikit-learn.org/stable/modules/permutation_importance.html
There are various packages that implement it, like sklearn in Python and Boruta in R.
Quick tip for Permutation Feature Importance: In order to have a faster and more logical way of running this, try clustered Permutation Feature Importance (https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance_multicollinear.html#sphx-glr-auto-examples-inspection-plot-permutation-importance-multicollinear-py) . Essentially, group your 138 features into several groups (by which variables are most similar), and then run permutation feat. imp. on each of the entire groups, not on individual variables.
If that's a bit too complicated advanced, more simple feature selection methods include things like forward stepwise selection (add variables one at a time), backward stepwise selection (remove variables one a time) and LASSO regression (type of regression that simultaneously finds a model and removes obviously bad variables. 138 might be too many to feed right into it, though). These are all relatively straightforward to implement, and a peruse through Google should give you a good intuition/code for how to do all of them.
Side note: There are certain algorithms (like RandomForest) that often do not benefit greatly from feature selection. So, if are familiar with that technique and don't want to fuss with feature selection, that could be an option as well.
I hope this gives you good reasoning for why feature selection might be helpful even when correlation between variables and target is low, as well as some guidance about how you can get started running some feature selection on your data.
Best Answer
Meant this as a comment, but it grew quite long.
Yeah, it's not all lost. You didn't really detail the problem and the data available to you, but I'll try to give an informed opinion here.
Actually, the low correlations themselves do not mean anything. You have to keep in mind correlation is a measure of linear association, so you get low values for strongly non-linear association which can be captured by learning algorithms.
Besides that, you are probably using all independent variables at once in your model, so you can expect to have a better performance than using any of them separately.
You didn't test the predictive power of interactions as well.
And also keep in mind non-linear algorithms may find some underlying organization that's not obvious looking at the variables.
Above everything else, you can't really know beforehand how your model will operate, you have to test it, so good luck!