Logistic Regression – Loss Function of scikit-learn LogisticRegression

logisticscikit learn

I am having trouble to understand the loss function scikit-learn uses to fit logistic regression, which can be found here.

Specifically I have problem with the second term. It seems very different from the usual MLE criterion. Can someone give me some hint where this comes from?

$$\mathop {\min{\mkern 1mu} }\limits_{w,c} \frac{1}{2}{w^T}w + C\sum\limits_{i = 1}^n {\log } (\exp ( – {y_i}(X_i^Tw + c)) + 1)$$

I think usually the log likelihood of a logistic regression is something like below. Clearly the first term of below is missing from the scikit-learn objective function.

$$LLH=\sum_{i=1}^n \left[{y_i}(X_i^Tw + c) – \ln\{1+\exp(X_i^Tw + c)\} \right]$$

Best Answer

These two are actually (almost) equivalent because of the following property of the logistic function:

$$ \sigma(x) = \frac{1}{1+\exp(-x)} = \frac{\exp(x)}{\exp(x)+1} $$

Also

$$ \sum_{i=1}^n \log ( 1 + \exp( -y_i (X_i^T w + c) ) ) \\ = \sum_{i=1}^n \log \left[ (\exp( y_i (X_i^T w + c) ) + 1) \exp( -y_i (X_i^T w + c) ) \right] \\ = -\sum_{i=1}^n \left[ y_i (X_i^T w + c) - \log (\exp( y_i (X_i^T w + c) ) + 1) \right] $$

Note, though, that your formula doesn't have $y_i$ in the "log part", while this one does. (I guess this is a typo)