Solved – Calculating Costs for ROC Curves

classificationrocthreshold

I am trying to calculate the optimal threshold for a binary classifier using Receiver operating characteristic (ROC) Curves. Currently I am assigning a cost for each false negative and another cost for each false positive. Then I am running a linear optimization program to minimize the total cost [Min Cost1*(number of FN) + Cost2*(number of FP)]. My two questions are:

  • Is there a formal way to calculate the costs to assign to the
    mislabeled instances? I have looked for published papers on this
    topic but I could not find any.
  • Is there a better way to find the optimal threshold?

Best Answer

There is no way to "calculate" the costs of mislabeled instances. That depends on the underlying subject matter and the purpose of your classification scheme. Essentially you have to answer the question yourself about how much it "costs," in your application, to have false positives or to have false negatives. You might even want to balance off those costs against the "benefits" of true positives and the benefits of true negatives. (Those 2 correct classifications might have different benefits for your application.) Of course only relative costs/benefits are needed for this, not absolute values in currency units.

You also have to consider whether you really want to be jumping to a binary classifier at this point in your analysis. Although many situations do end up with a forced yes/no decision, sometimes you may also want to say "I need more information" before a final decision is made. If your classifier is being used together with other information to make some final yes/no decision, then you might be better off sticking with the probabilities of class membership at this point and combining that continuous probability estimate with other information before you make the final decision.

In terms of finding a threshold that minimizes net costs, your general approach is a way to start. You should, however, make sure that your cost estimates are not unduly tied to the particulars of your present data sample. For example, you could repeat your entire model building process with cross validation and choose the process that provides the minimum net cost on the held-out cases over multiple cross-validation sets. You might not produce the same model as you would based on the full data set, but what you get might generalize better to new cases.

Related Question