I am experimenting with logistic regression to predict a binary target variable.
Using Stata, I have generated predicted probabilities between 0 and 1.
Now, I am trying to think about how to translate these probabilities into the binary classification. Using a rule like "<50%==0 and >=50%==1" feels arbitrary, but I haven't found a better solution so far.
Any ideas?
Best Answer
ROC (Receiver operating characteristic) curve (http://en.wikipedia.org/wiki/Receiver_operating_characteristic) is one way of finding best cutoff and is widely used for this purpose. From http://www.stata.com/manuals14/rroc.pdf :
You can use roctab, roccomp, rocfit, rocgold, rocreg, and rocregplot in stata for this purpose.
The cutoff that gives curve with maximum area under it is the best, as shown in following figure from http://www.adscience.eu/uploads/ckfiles/files/html_files/StatEL/statel_ROC_curve.htm