Solved – Is it possible to combine predictions to improve overall prediction quality

boostingmachine learningprediction

This is a binary classification problem. The metric that is being minimised is the log loss ( or cross entropy ). I also have an accuracy number, just for my information. It is a large, very balanced data set. Very naive prediction techniques get about 50% accuracy and 0.693 log loss. The best I've been able to scrape out is 52.5% accuracy and 0.6915 log loss. Since we are trying to minimize the log loss, we always get a set of probabilities ( predict_proba functions in sklearn and keras ). Thats all background, now the question.

Lets say I can use 2 different techniques to create 2 different sets of predictions that have comparable accuracy and log loss metrics. For example, I can use 2 different groups of the input features to produce 2 sets of predictions that are both about 52% accurate with < 0.692 log loss. The point is that both sets of predictions show there is some predictive power. Another example is that I could use logistic regression to produce one set of predictions and a neural net to produce the other.

Here are the first 10 for each set, for example:

p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
p2 = [0.4833959  0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]

I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?

I had started trying some things. For example I consider the absolute value of the prediction minus 0.5 ( abs( p - 0.5 ) ) as a signal, and whichever between p1 and p2 had a greater signal, I would use that value. This slightly accomplished that I wanted, but just by a slim margin. And in another instance it didn't seem to help at all. Interestingly it didn't seem to destroy the predictive power.

Best Answer

Short answer: Yes.

Long answer: This is one of many examples of a technique known as "stacking". While you can, of course, decide on some manual way to combine both predictions, it is even better if you train a third model on the output of the first two models (or even more). This will further improve the accuracy. To avoid re-using the data, often a different part of the data set is used for training the first levels, and training the model that combines the data.

See e.g. here for an example.