Logistic Regression – Determining Number of Coefficients and Intercepts in Sklearn

logisticmachine learningpythonregressionscikit learn

I noticed that the matrix of coefficients learned by a logistic regression
model (which can be retrieved with the .coef_ attribute) is $(c, n)$
where $n$ is the number of features (i.e. the number of columns in the
X that you passed to the .fit function), and $c$ is the number classes
(i.e. the number of unique values in the y you passed to the .fit function).

The number of intercepts will just be $c$, except if there are only two
unique values in y, in which case there will only be one intercept.

Can someone please explain how these numbers make sense? To begin with, how
could there a matrix of coefficients on a single logistic regression
model?

Best Answer

The library creates $c$ neurons for $c$ classes for $c>2$, which yields $c\times f$ coefficients and $c$ biases, where $f$ is the number of features. So, it's like a collection of logistic regressions, or like a single layer neural network.

Related Solutions

Solved – Coefficients for every group in ordered logistic regression (polr) in R

Alright, you can hit me now - after a couple of days of thinking I figured the answer out myself - but in any case that someone should have the same problem, please continue reading:

The answer is very (!) simple. Since this method is called proportional odds logistic regression, the coefficients are of course the same for every level of the dependent variable. And you get two thresholds for three DV levels (thresholds are the negative intercept, thus you'll notice a minus-sign below)

You just do this:

log_pred_probs1 <- 1-(exp(-logit_model$zeta[1] +
logit_model$coefficients[1] * IV_1 + logit_model$coefficients[2] * IV_2)/
(exp(-logit_model$zeta[1] + logit_model$coefficients[1] * IV_1 + 
logit_model$coefficients[2] * IV_2))

log_pred_probs2 <- 1-(exp(-logit_model$zeta[2] +
logit_model$coefficients[1] * IV_1 + logit_model$coefficients[2] * IV_2)/
(exp(-logit_model$zeta[2] + logit_model$coefficients[1] * IV_1 + 
logit_model$coefficients[2] * IV_2)) - log_pred_probs1

notice the two intercepts AND the subtraction of the first level's probability!

and finally

log_pred_probs3 <- 1 - log_pred_probs2 - log_pred_probs1

It's easy as that. Cheers!

Solved – Interpreting multinomial logistic regression in scikit-learn

As the probabilities of each class must sum to one, we can either define n-1 independent coefficients vectors, or n coefficients vectors that are linked by the equation \sum_c p(y=c) = 1.

The two parametrization are equivalent. See also in Wikipedia Multinomial logistic regression - As a log-linear model.

For a class c, we have a probability P(y=c) = e^{b_c.X} / Z, with Z a normalization that accounts for the equation \sum_c P(y=c) = 1. These probabilities are the expected probabilities of a class given the coefficients. They can be computed with predict_proba

To have better insight of the coefficients, please consider the left plot in this example. example http://scikit-learn.org/dev/_images/plot_logistic_multinomial_001.png

In this example there are 3 classes a, b, c and 2 features x0, x1. The class is noted y.

After the fit of a multinomial logistic, each class as a coefficients vector C with 2 components (for the 2 features): (C_a0, C_a1), (C_b0, C_b1), (C_c0, C_c1) There is also an intercept (aka biais) I for each class, which are always unidimensional: I_a, I_b, I_c

The dash line represents the hyperplane defined by C and I: example: for class a, the hyperplane is defined by the equation x0 * C_a0 + x1 * C_a1 + I_a = 0 This is the hyperplane where P(y=a) = e^{x0 * C_a0 + x1 * C_a1 + I_a} / Z = 1 / Z. If C_a0 is positive, when x0 increases P(y=a) increases. If C_a0 is negative, when x0 increases P(y=a) decreases.

However this is not the decision boundary. The decision boundary between classes a and b is defined by the equation: p(y=a) = p(y=b) which is e^{x0 * C_a0 + x1 * C_a1 + I_a} = e^{x0 * C_b0 + x1 * C_b1 + I_b} or again x0 * C_a0 + x1 * C_a1 + I_a = x0 * C_b0 + x1 * C_b1 + I_b. This boundary hyperplane is visible in the plot by the background colors. If C_a0 - C_b0 is positive, when x0 increases P(y=a) / P(y=b) increases. If C_a0 - C_b0 is negative, when x0 increases P(y=a) / P(y=b) decreases.

Best Answer

Related Solutions

Solved – Coefficients for every group in ordered logistic regression (polr) in R

Solved – Interpreting multinomial logistic regression in scikit-learn

Related Question