Solved – Can we express logistic loss minimization as a maximum likelihood problem

classificationlogisticmaximum likelihood

I have a simple question about the equivalence of loss minimization and likelihood maximization for logistic regression.

Say are given some data $(x_i,y_i) \text{ for } ~i = 1,\ldots,N$ where $x_i \in \mathbb{R}^d$ and $y_i \in \{-1,1\}$.

In logistic regression, we fit the coefficients of a linear classification model $w \in \mathbb{R}^d$ by minimizing the logistic loss function. This results in the optimization problem:

$$ \min_w \sum_{i=1}^N \log(1+\exp(-y_iw^Tx_i)) $$

Let $w^*$ be the optimal set of coefficients obtained from this problem.

I am wondering: can we recover $w^*$ by solving a maximum likelihood problem? If so, what is the exact maximum likelihood problem that we would need to solve? And what would be the corresponding likelihood function for $w$?

Best Answer

It is equivalent to the maximum likelihood approach. The different appearance results from the different coding for $y_i$ (which is arbitrary).

Keeping in mind that $y_i \in \{-1,1\}$, and denoting

$$\Lambda(z) = [1+\exp(-z)]^{-1}$$

we have that

$$\min_w \sum_{i=1}^N \log[1+\exp(-y_iw^Tx_i)] = \max_w \sum_{i=1}^N \log \Lambda(y_iw^Tx_i)$$

$$=\max_w \Big\{\sum_{y_i=1} \log \Lambda(w^Tx_i) + \sum_{y_i=-1} \log [1-\Lambda(w^Tx_i)]\Big\} \tag{1}$$

In the usual maximum likelihood approach we would have use the label, $z_i \in \{0,1\}$, ($0$ instead of $-1$), and we would have assumed that

$$P(z_i =1 \mid x_i) = \Lambda(w^Tx_i)$$

With an independent sample, we would have arrived at the log-likelihood

$$=\max_w \Big\{\sum_{i=1}^N z_i\log \Lambda(w^Tx_i) + \sum_{i=1}^N (1-z_i)\log [1-\Lambda(w^Tx_i)]\Big\} \tag{2}$$

Now note that

$$\sum_{y_i=1} \log \Lambda(w^Tx_i) = \sum_{i=1}^N z_i\log \Lambda(w^Tx_i)$$

and

$$\sum_{y_i=-1} \log [1-\Lambda(w^Tx_i)] = \sum_{i=1}^N(1-z_i) \log [1-\Lambda(w^Tx_i)]$$

so $(1)$ and $(2)$ are equivalent.