Solved – Regression with features that are not very correlated with the dependent variable

correlationfeature selectionfeature-engineeringregression

I am trying to use regression techniques and neural networks to predict the values of a continuous dependent variable(y) based on a set of features (X1,X2,… XN), where N is the total number of distinct features. I plotted the correlation coefficients between each one of these variables and found that the correlation between y and any other feature is very low (highest is around 0.14).

enter image description here

So my question is, is there even a hope to train a machine learning technique with these features, since the correlations are very small?

Are there ways to transform these features, so that they will correlate more with y?

Best Answer

Meant this as a comment, but it grew quite long.

Yeah, it's not all lost. You didn't really detail the problem and the data available to you, but I'll try to give an informed opinion here.

  • Actually, the low correlations themselves do not mean anything. You have to keep in mind correlation is a measure of linear association, so you get low values for strongly non-linear association which can be captured by learning algorithms.

  • Besides that, you are probably using all independent variables at once in your model, so you can expect to have a better performance than using any of them separately.

  • You didn't test the predictive power of interactions as well.

  • And also keep in mind non-linear algorithms may find some underlying organization that's not obvious looking at the variables.

  • Above everything else, you can't really know beforehand how your model will operate, you have to test it, so good luck!

Related Question