SVC's predict just uses its decision function, which is distance from the hyperplane.
According sklearn documentation, SVC's predict_proba does the following
The probability model is created using cross validation, so the results can be slightly different than those obtained by predict. Also, it will produce meaningless results on very small datasets.
according to their documentation here.
Much more details here. You will have to read Wu et al (2004) paper, mentioned in that section to figure out how exactly they did it. I am not familiar with it.
Here's what I would recommend: Use probability rankings and class proportions in the training sample to determine the class assignments.
You have three (estimated) probabilities: $p_a, p_b,$ and $p_c$. And you have the original class proportions from the training sample: $m_a, m_b,$ and $m_c$, where $m_a$ is the percentage of classes that belong to class $a$ (e.g., 0.6), and so on.
You can start with the smallest class, say $b$, and use $p_b$ to rank order all records from the highest to lowest values. From this rank-ordered list, start assigning each record to class $b$ until you have $m_b$ percent records assigned to this class. Record the value for $p_b$ at this stage; this value will become the cut-off point for class $b$.
Now take the next smallest class, say $c$, and use $p_c$ to rank order all records and follow the same steps described in the paragraph above. At the end of this step, you will get a cut-off value for $p_c$, and $m_c$ percent of all records would be assigned to class $c$.
Finally, assign all remaining records to (the largest) class $a$.
For future scoring purposes, you can follow these steps but discard the class proportions. You can let the probability cut-off values for class $b$ and $c$ to drive class assignments.
In order to make sure that this approach yields a reasonable level of accuracy, you can review the classification matrix (and any other measures you are using) on the validation set.
Best Answer
According to @cangrejo's answer: https://stats.stackexchange.com/a/310956/194535, suppose the original output probability of your model is the vector $v$, and then you can define the prior distribution:
$\pi=(\frac{1}{\theta_1}, \frac{1}{\theta_2},..., \frac{1}{\theta_N})$, for $\theta_i \in (0,1)$ and $\sum_i\theta_i = 1$, where $N$ is the total number of labeled classes, $i$ is the class index.
Take $v' = v \odot \pi$ as the new output probability of your model, where $\odot$ denotes an element-wise product.
Now, your question can be reformulate to this: Finding the $\pi$ which optimize the metrics you have specified (eg.
roc_auc_score
) from the new output probability model. Once you find it, the $\theta s (\theta_1, \theta_2, ..., \theta_N)$ is your optimal threshold for each classes.The Code part:
Create a
proxyModel
class which takes your original model object as an argument and return aproxyModel
object. When you calledpredict_proba()
through theproxyModel
object, it will calculate new probability automatically based on the threshold you specified:Implement a score function:
Define
weighted_score_with_threshold()
function, which takes the threshold as input and return weighted score:Use optimize algorithm
differential_evolution()
(better then fmin) to find the optimal threshold: