I have to implement a weighted majority vote which is an aggregation technique to generate one final prediction. In fact, several classifiers make local predictions. These are then collected and combined using a weighted majority rule to output the final prediction.
In this article, the soft voting is as follow:
$$ \hat{y} = \arg \max_i \sum^{m}_{j=1} w_j p_{ij} $$
I didn't understand the predicted class probabilities for each classifier $p$. Does it mean the result of = occurence number of an activity / total number of instance in training set?
Is there a clear explanation for $p_{ij}$? I want to compute $p_{ij}$ manually.
Best Answer
$p_{ij}$ is the probability assigned to $i$-th category by the $j$-th classifier. For example, you have a binary classification problem (cat vs. non-cat) and two classifiers: logistic regression and neural network with logistic link on the output layer. You make a prediction for some example and logistic regression says that the probability that it is a cat is 0.328, while neural network says that it is 0.21, not the weighted majority rule says that
$$ \begin{align} \text{score}(\text{cat}) &= w_1 0.328 + w_2 0.21 \\ \text{score}(\text{non-cat}) &= w_1 (1-0.328) + w_2 (1-0.21) \end{align} $$
where $w_1$ and $w_2$ are weights applied to both classifiers. The class with greater score "wins" the competition and it taken as your classification.