Solved – How to proceed with building an ensemble classifier using Naive Bayes, TAN and Logistic Regression in R

classificationensemble learningmachine learningpredictive-modelsr

I'm relatively new to machine learning (started about 5 months ago), and I'm looking at potentially implementing an ensemble classifier as part of my research.

I have built 3 models that I use to classify whether sales data is going to win or lose. Each model produces the probability of the sale winning or losing, and then I apply thresholds to those to classify them as either a "Win", "Loss" or "Borderline Loss". There are 25 variables, all of which are discrete.

The three models are Naive Bayes, Tree Augmented Naive Bayes (TAN) and Logistic Regression. I am using the bnlearn package for the bayesian classifiers, and a simple glm for the Logistic Regression. All models have high accuracy performances when tested on unseen data:

Naive Bayes Accuracy: 88%

TAN Accuracy: 91%

Logistic Regression Accuracy: 92%

I want to try implementing an ensemble classifier to see if I can get the best possible accuracy across all three models. My question is, how do I go about implementing something like this? I can't find too many examples online, at least not with these models for implementing one. From what I have read, one way to do it is to have a voting system, where if the 2 models predict the sale will win, but 1 predicts with will lose, then it is classified as a win. But what happens in this case if all 3 models had different predictions? I have all my prediction data ready, as in I have all the test data and each models prediction for each sale, my question so is, how would I proceed from here?

If someone knows of any available resources or tutorials that may help, I would greatly appreciate it!

Best Answer

You're discarding information by taking continuous predictions and making them into categories. I would strongly advise against taking that path. Instead, you could (1) average or (2) stack the models. Averaging is straightfoward and requires no additional tuning; on the other hand, all models are weighted equally which might be suboptimal. Stacking adds another layer of learning to train the model on the model outputs against unseen data outcomes.