Solved – How to proceed with building an ensemble classifier using Naive Bayes, TAN and Logistic Regression in R

I'm relatively new to machine learning (started about 5 months ago), and I'm looking at potentially implementing an ensemble classifier as part of my research.

I have built 3 models that I use to classify whether sales data is going to win or lose. Each model produces the probability of the sale winning or losing, and then I apply thresholds to those to classify them as either a "Win", "Loss" or "Borderline Loss". There are 25 variables, all of which are discrete.

The three models are Naive Bayes, Tree Augmented Naive Bayes (TAN) and Logistic Regression. I am using the bnlearn package for the bayesian classifiers, and a simple glm for the Logistic Regression. All models have high accuracy performances when tested on unseen data:

Naive Bayes Accuracy: 88%

TAN Accuracy: 91%

Logistic Regression Accuracy: 92%

I want to try implementing an ensemble classifier to see if I can get the best possible accuracy across all three models. My question is, how do I go about implementing something like this? I can't find too many examples online, at least not with these models for implementing one. From what I have read, one way to do it is to have a voting system, where if the 2 models predict the sale will win, but 1 predicts with will lose, then it is classified as a win. But what happens in this case if all 3 models had different predictions? I have all my prediction data ready, as in I have all the test data and each models prediction for each sale, my question so is, how would I proceed from here?

If someone knows of any available resources or tutorials that may help, I would greatly appreciate it!

Best Answer

You're discarding information by taking continuous predictions and making them into categories. I would strongly advise against taking that path. Instead, you could (1) average or (2) stack the models. Averaging is straightfoward and requires no additional tuning; on the other hand, all models are weighted equally which might be suboptimal. Stacking adds another layer of learning to train the model on the model outputs against unseen data outcomes.

Best Answer

Related Solutions

Solved – Naive-Bayes classifier for unequal groups

Solved – How are classifications merged in an ensemble classifier

Related Question