Logistic Regression – Correct Loss Function

logisticloss-functions

I read about two versions of the loss function for logistic regression, which of them is correct and why?

From Machine Learning, Zhou Z.H (in Chinese), with $\beta = (w, b)\text{ and }\beta^Tx=w^Tx +b$:

$$l(\beta) = \sum\limits_{i=1}^{m}\Big(-y_i\beta^Tx_i+\ln(1+e^{\beta^Tx_i})\Big) \tag 1$$
From my college course, with $z_i = y_if(x_i)=y_i(w^Tx_i + b)$:

$$L(z_i)=\log(1+e^{-z_i}) \tag 2$$

I know that the first one is an accumulation of all samples and the second one is for a single sample, but I am more curious about the difference in the form of two loss functions. Somehow I have a feeling that they are equivalent.

Best Answer

The relationship is as follows: $l(\beta) = \sum_i L(z_i)$.

Define a logistic function as $f(z) = \frac{e^{z}}{1 + e^{z}} = \frac{1}{1+e^{-z}}$. They possess the property that $f(-z) = 1-f(z)$. Or in other words:

$$ \frac{1}{1+e^{z}} = \frac{e^{-z}}{1+e^{-z}}. $$

If you take the reciprocal of both sides, then take the log you get:

$$ \ln(1+e^{z}) = \ln(1+e^{-z}) + z. $$

Subtract $z$ from both sides and you should see this:

$$ -y_i\beta^Tx_i+ln(1+e^{y_i\beta^Tx_i}) = L(z_i). $$

Edit:

At the moment I am re-reading this answer and am confused about how I got $-y_i\beta^Tx_i+ln(1+e^{\beta^Tx_i})$ to be equal to $-y_i\beta^Tx_i+ln(1+e^{y_i\beta^Tx_i})$. Perhaps there's a typo in the original question.

Edit 2:

In the case that there wasn't a typo in the original question, @ManelMorales appears to be correct to draw attention to the fact that, when $y \in \{-1,1\}$, the probability mass function can be written as $P(Y_i=y_i) = f(y_i\beta^Tx_i)$, due to the property that $f(-z) = 1 - f(z)$. I am re-writing it differently here, because he introduces a new equivocation on the notation $z_i$. The rest follows by taking the negative log-likelihood for each $y$ coding. See his answer below for more details.

Best Answer

Edit:

Edit 2:

Related Solutions

Logistic Regression – The Correct Loss Function for Logistic Regression

Newton’s Method – Using Newton’s Method for Bernoulli Likelihood with Ridge Penalty

Related Question