I have some experience with building classification models in R but it's my first time with Python's sklearn.
So the the problem is: when fitting logistic regression model in R, I have immediate access to predicted class probabilities (model$fitted.values), so I can set my threshold (different than 0.5) in order to maximize some measure.
But in sklearn after fitting I can't find a way to access probabilities. Is it possible? There is a method predict_proba(), but…as the name suggests, it is prediction. So in order to get probabilities, should I 'artificially' do the following procedure?
model = sklearn.linear_model.LogisticRegression()
model.fit(train_X, train_y)
probs = model.predict_proba(train_X)
Does it make any sense? Or is there some different method to obtain it?
Best Answer
It is correct to use the method predict_proba to get the estimated probabilities. Just to make it a bit clearer, consider the example given here (http://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html).
Below I copy just the relevant lines, so that anyone can copy & paste it, and reproduce what I say next,
In the iris dataset you have three classes (0, 1, 2 = ['setosa', 'versicolor', 'virginica']), contained in Y. When you call,
you get an array of arrays of probabilities, each element being p(class|x),
The other alternatives are predict_log_proba() to get log P(class|x) and predict() to get the class with highest probability among the three for a given sample.