When using logistic regression in Python's scikit-learn, one may handle multiclass problems even with binary logistic regression. If there are $K$ classes, then coefficients (i.e. weights and biases) for $K$ logistic functions will be produced. But this is using a 'one vs. rest' approach, and the probabilities from the individual logistic functions won't necessarily add up to 1 since this is binary logistic regression. Therefore, when using predict_proba
with sklearn's logistic regression, how are probabilities handled in multiclass problems?
I've investigated this and it appears to be similar to applying a softmax function to the individual probabilities of the $K$ logistic functions, but this is not exactly correct. I also do not see explicit mention of this in the documentation.
Best Answer
It appears that, they just apply simple normalization (i.e. divide by the sum of the probabilities) or softmax when
multi_class
option is set toovr
ormultinomial
respectively.