Solved – Combining multiple classifiers

classificationconditional probabilityensemble learningnaive bayesprobability

I am trying to do a binary classification of text articles into {relevant, non-relevant}. The text articles have following features: [[article text, author & source, and image]]. Hence, I have built three classifiers each focussing on different features:

  1. first one is a NB Text classifier,
  2. second draws distributions over author & source, and
  3. third one is an image classifier.

Each of these classifiers return a "Probability of the article being relevant". If this probability is greater than 0.5 the article is classified as relevant, otherwise non-relevant.

Based on the classification success on validation data, I have identified the accuracy for each of these models.

Problem: For an incoming test article, each of these models generate some probability score of the article being relevant. I want to generate a final probability score taking into account the accuracy of each model.

What I Tried

  1. Currently, I am using a Normalized Weighted score using accuracy as weights.
  2. I also built up a NB model, conditioning on the output of each classifier, and assuming conditional independence for the outputs (which is a flawed assumption given the problem statement. That's why I am not much inclined towards using a NB model for this).

I feel I am just scratching the surface here and there must be good amount of literature on: 1) Merits/Demerits of working with multiple models, and 2) Combining output of multiple classifiers.

However, I am not able to reach the right set of articles (Not sure what to call such type of problems. Searching of ensemble/combination leads to Ensemble Learning (https://en.wikipedia.org/wiki/Ensemble_learning), which in my opinion is not exactly what I am trying to solve).

Best Answer

This may be helpful as well: Kuncheva, L. I. (2004). Combining pattern classifiers: methods and algorithms

Edit: For my similar problem, I ended up finding classifier probabilities based on accuracy values as described in the question here: Assigning probabilities to ensemble experts (classification) using using Theorem 4.2 (p. 127) from Kuncheva, L. I. (2004). Combining pattern classifiers: methods and algorithms, which says that the optimal combination weights for this case are $w_i=log(p_i/(1−p_i))$, where $p_i$ is the classification accuracy of the $i$-th expert. Then I converted weights to probabilities using projection onto probability simplex. Then one could combine those with the probabilities of each prediction that you already have to get the final probability values.

Related Question