Regression – Loss Function for Regression Using Artificial Neural Networks

cross entropyloss-functionsneural networksregression

For classification, one of the most common loss functions for artificial neural networks (ANN) is cross-entropy. What about in ANN for regression?

and why is cross-entropy hardly even discussed in ANN's encyclopedia entries?

Best Answer

The common loss function for regression with ANN is quadratic loss (least squares). If you're learning about NN from popular online courses and books, then you'll be told that classification and regression are two common kinds of problems where NN are applied. With the former being to fit to continuous output Y, and the latter for the categorical Y such as man/woman. So, in the regression least squares are quite popular, and they make no sense in many classification applications.

However, cross-entropy loss is used in regression too. For instance, when predicting the probability of default of loans the labels are 0 (current) and 1 (default) for historical observations of defaults in the loan pool. So this will look like a classification problem, but it really is a logit regression, where you're forecasting default probability (PD) of a loan pool using logistic activation in the output layer and cross-entropy loss. The trick is that PD of a pool is a continuous quantity, while the loan state observations are clearly categorical "default" or "current".

Why would you start with least squares and not cross-entropy? The reason is that least squares is the foundational concept, while cross-entropy is somewhat more advanced. Look, I can explain least squares to an elementary school student. I can't do the same with "entropy." In fact, most people who use cross-entropy do not have a slightest clue of what is entropy.