Solved – Why don’t we train neural networks to maximize linear correlation instead of error

correlationerrormachine learningneural networksregression

Recently a project I've been a part of has involved training neural networks so that we maximize the Pearson correlation between actual and predicted values. So this came to my mind: why don't we change the mathematical workings of, say, gradient descent so that instead of minimizing RMSE, we maximize $r$? If we can make the network predict with a high correlation, all we have to do is chain a linear function to the predictions and we have good prediction.

Best Answer

Because that would be a completely different objective altogether. Note that unlike MSE, Pearson correlation is maximal iff there is a linear relationship between both variables. This means that

  • The network would "think" it has correctly learned its inputs if its output is roughly proportional to the dependent variable samples, rather than equal (or similar). Therefore predicting $Y$ or $2Y$ or $-Y$ (etc.) would be equivalent. This is generally undesirable, since we would like our network to give prediction similar to its inputs, rather than proportionally to said inputs.

  • There would not be a global minimum to the optimisation problem thus posed. Any proportional constant as set above would give an optimal solution. This is undesirable from a numerical point of view and would lead to instability.

Related Question