Machine Learning – How to Combine Predictions from Ensemble Learning Models for Enhanced Accuracy

accuracyensemble learningheuristicmachine learning

Suppose I have three models, model-1, model-2, model-3 for binary classification. Suppose model-1 has $a_1$ accuracy, model-2 has $a_2$ accuracy, model-3 has $a_3$ accuracy.

For some test data, model-1 classifies it as $1$, whereas model-2 and model-3 classify it as $0$.

What are the best heuristic ways to combine the result and decide the final classification ?

One of the naive way to do this would be :

We can create map $M$ where $M[1]=1$ and $M[0]=-1$. Let we denote result of model-1 as $r_1$, model-2 as $r_2$, model-3 as $r_3$. Let us define $S = a_1*M[r_1]+a_2*M[r_2]+a_3*M[r_3]$. If $S>0$ we can declare final result as $1$ else $0$.

Problem with this is it won't give accurate result with large number of models. Suppose I have $100$ models,also suppose first model has accuracy of $0.9$ and other $99$ models have accuracy of $0.1$ then final result might get wrong.

One way to resolve this is we can only allow those models whose accuracy is not less than $H-0.1$ where $H$ is highest accuracy of any model.

What are some best way to combine the results from different models ? Are there some libraries in python to do this ? I found some of way to do ensemble learning but I want to keep some weightage proportional to accuracy of models (based on previous performance on test data) and not do just some kind of averaging.

Best Answer

First off, don't use accuracy, which is a seriously misleading evaluation measure. Instead, use probabilistic classifications, and evaluate these using proper scoring rules.

You can easily combine probabilistic classifications. If your three models yield predicted probabilities $\hat{p}_1, \hat{p}_2$ and $\hat{p}_3$ for a new instance to belong to the target class, simply take the average of the $\hat{p}_i$ as the combined probabilistic prediction.

If your models differ on some proper scoring rule, then you can use a weighting based on these scores. Or you could even run a logistic regression of your target on the in-sample probabilistic classifications of your three models.

Note, however, that any attempt to estimate weights for combinations will introduce additional variance, and the end result may well be worse than if you had used a simple unweighted average (e.g., Claeskens et al., 2016).