Solved – Why is the default cost function choice of a neuron quadratic loss

machine learningneural networks

I'm studying neural networks, and I'm trying to decide why the default choice of cost function for a single neuron seems to be quadratic loss:
$$\sum_i(y_i-f_i)^2,$$

instead of:

$$-\prod_ip_i^{y_i}(1-p_i)^{1-y_i},$$

as per logistic regression. Where both $f_i$ and $p_i$ are the sigmoid (activation) function.

I understand that the neuron does not classify per say but instead modulates/dampens the output based on its activation function, so that when many neurons are connected together to form the network, classification only needs to be done on the output node using some cut-off value.

Nevertheless, if classification is our goal, it seems like the right choice of cost function should be that which relates the strength of the firing directly to the probability that it would fire if it were only firing at total strength or not at all. And doing this in such a way that if we ran the neuron many times using this probability as the probability that it would fire, then the expected value of the neuron firing would equal the strength with which it fires every time upon minimizing our cost function. (sorry I may have worded that poorly)

Is there a reason then that the logistic regression cost function is not the default choice?

Best Answer

The cost function derived in logistic regression is quite similar to the cross-entropy cost function in NN terminology. The difference is that in cross-entropy they took a log and divided it by the number of samples.

The reason for the log is for the ease of getting the gradient, and dividing by a constant does not affect the learning in terms of gradient decent(Except for Plain gradient descent).

It is used quite often. The default-introduction of the quadratic loss in text books may be due to historical reasons.