Solved – Defining cutoff point for logistic regression

classificationlogisticregression

I am experimenting with logistic regression to predict a binary target variable.

Using Stata, I have generated predicted probabilities between 0 and 1.

Now, I am trying to think about how to translate these probabilities into the binary classification. Using a rule like "<50%==0 and >=50%==1" feels arbitrary, but I haven't found a better solution so far.

Any ideas?

Best Answer

ROC (Receiver operating characteristic) curve (http://en.wikipedia.org/wiki/Receiver_operating_characteristic) is one way of finding best cutoff and is widely used for this purpose. From http://www.stata.com/manuals14/rroc.pdf :

ROC analysis quantifies the accuracy of diagnostic tests or other evaluation modalities used to discriminate between two states or condition

You can use roctab, roccomp, rocfit, rocgold, rocreg, and rocregplot in stata for this purpose.

The cutoff that gives curve with maximum area under it is the best, as shown in following figure from http://www.adscience.eu/uploads/ckfiles/files/html_files/StatEL/statel_ROC_curve.htm

enter image description here