Solved – How do we use logistic regression (scikit-learn) to predict values

predictive-modelsscikit learn

Logistic regression can help to predict a value whether it would happen or no. I'd like to know how can I do that using sklearn.

I'd like to know the probability if this event would happen or no.

I have a huge dataset (20K lines and 20 columns). My data has 19 columns as predictors and last column as target (values between 0-10). To simplify work, I am using random data to understand how can I interpret data.

A,B,C : Predictors

target: as a target

from sklearn import linear_model
import pandas as pd

dataset = pd.DataFrame({'A':np.random.rand(100)*1000, 'B':np.random.rand(100)*100,  'C':np.random.rand(100)*10, 'target':np.random.rand(100)})

predictors= dataset.ix[:,['A','B','C']].values
target = dataset.ix[:,['target']].values

lr = linear_model.LogisticRegression()

lr.fit(predictors, target)

linear_model.LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)

Now, should I plot (lr.predict_proba) to get probability of every element ?

what should I do in order to have probability of every line.

Best Answer

In scikit-learn all classifiers implement the ClassifierMixin interface. In order to so they must provide the fit() method. The Logistic Regression classifier provides two methods to get the probability for each sample: predict_log_proba() and predict_proba() (see documentation on the Logistic Classifier for more details).

There are a few examples using the Logistic regression model in the examples page