Solved – Why don’t we train neural networks to maximize linear correlation instead of error

correlationerrormachine learningneural networksregression

Recently a project I've been a part of has involved training neural networks so that we maximize the Pearson correlation between actual and predicted values. So this came to my mind: why don't we change the mathematical workings of, say, gradient descent so that instead of minimizing RMSE, we maximize $r$? If we can make the network predict with a high correlation, all we have to do is chain a linear function to the predictions and we have good prediction.

Best Answer

Because that would be a completely different objective altogether. Note that unlike MSE, Pearson correlation is maximal iff there is a linear relationship between both variables. This means that

The network would "think" it has correctly learned its inputs if its output is roughly proportional to the dependent variable samples, rather than equal (or similar). Therefore predicting $Y$ or $2Y$ or $-Y$ (etc.) would be equivalent. This is generally undesirable, since we would like our network to give prediction similar to its inputs, rather than proportionally to said inputs.
There would not be a global minimum to the optimisation problem thus posed. Any proportional constant as set above would give an optimal solution. This is undesirable from a numerical point of view and would lead to instability.

Best Answer

Related Solutions

Solved – How to use negative examples (in addition to positive ones) for training a multiclass softmax classifier (or a neural net with softmax output)

Solved – Training neural network on skewed dataset: output always 1

Related Question