Logistic Regression – sklearn Logistic Regression Converging to Unexpected Coefficient

logisticregressionscikit learnsigmoid-curve

The case is as follows:

Suppose that

import numpy as np

X = np.array([1, 1, 1])
y = np.array([1, 0, 1])

Then I perform a logistic regression with no intercept to check out the fitted coefficient:

from sklearn.linear_model import LogisticRegression

def fit_predict(X, y, fit_intercept=True):
  model = LogisticRegression(fit_intercept=fit_intercept)
  model.fit(X.reshape(-1, 1), y)
  print(f'model coefficients: {model.coef_}')

fit_predict(X, y, fit_intercept=False)

# output: [[0.2865409]]

I am pretty confused by this output. According to my algebra (directly solving the optimization constraint), the coefficient should be $logit(2/3) \approx 0.6931471805599452$.

Is this because my math is wrong, or because something else is going on that I don't know about?

The algebra is as follows, starting with the following equation:

$$ \sum_i y_i \cdot x_i – sigmoid(x_i) \cdot x_i = 0$$

If we plug the values in, then $$2 = 3\cdot sigmoid(1)$$.

I conclude that $\beta = logit(2/3)$.

Thanks in advance.

Best Answer

I will add my own answer to this question in order to shine some light on why a penalty is added by default. I'm also posting for posterity as you are not the first person to get caught by this and you won't be the last.

Back in 2019, Zachary Lipton discovered sklearn applies the penalty by default too and this sparked a very intense debate on twitter and elsewhere. The long and the short of that discussion is that sklearn sees itself as a machine learning library first which in their eyes means that they favor other things over unbiasedness and estimates of effects. The most striking example of their philosophy (in my opinion) comes when Andreas Mueller plainly asks why someone would want an unbiased implementation of logistic regression. Inference simply isn't on their radar.

Hence, LogisticRegression is not the de jure Logistic Regression. It is a penalized variant thereof by default (and the default penalty doesn't even make any sense). There is another sharp point. Had you learned about penalized logistic regression a la ridge regression or the LASSO, you would be surprised to learn sklearn parameterizes the penalty parameter as the inverse of the regularization strength. Hence setting $\lambda=2$ in LASSO or Ridge would correspond to C=0.5 in LogisticRegression.

Let me sum up by making this completely unambiguous.

If you are intent on estimating the effects of some covariates on a binary outcome, and you insist on using python Do Not Use Sklearn. Use Statsmodels.

If however, you insist on using sklearn, remember that you need to set penalty='none' in the model instantiation step. Else, your estimates will be biased towards the null (by design).

Related Question