Solved – Running Decision Tree to measure model accuracy per prediction

accuracycartconfidence intervalmachine learningvalidation

I would like to have a degree of uncertainty of my predictions for each prediction I will perform in my new data. Is the procedure I have in mind reasonable?

Suppose I am modelling a binary classification data. I have fit a model on a training data and run it on my test data. Now for each prediction on my unseen new data, I want a measure of how uncertain is my prediction. By 'each prediction' I mean that I am not interested on some general performance metric (like accuracy) obtained in my test data, neither on some confidence interval based only on the prediction level itself. I want something identifying that the prediction of one specific observation is likely not trustable because the segments it belongs showed poor predictive power / this other observation prediction is probably highly accurate because this segment performed very well and there is no missing data etc.

What I have in mind to address this is to fit a simple decision tree on my test data. I would

1- Choose a threshold (e.g. 80%).

2- Filter in my test data only cases where my prediction was higher than 80%.

3- Flag all my correct and incorrect predictions.

4- Run a decision tree on this filtered test data using all modelled variables + the predicted probability as explanatory variables and the flag of incorrect predictions as response variable.

5- Predict this model on my new unseen data, together with my original model, and ignore predictions with a high probability of misclassification.

Does it sound appropriate or I am committing some bias / there are less computationally expensive ways to do this with the same robustness?

I appreciate your feedback and also some indication of work using an approach similar to this because I couldn't find it.

Cheers

Best Answer

First of all you are in the territory of problems some very smart people have developed on before, which makes it almost impossible to invent something better than the rest of the community without an intense study of the existing literature. Trust me I have tried.

In order to get good estimates of confidence you need to go into ensemble models. If you want to have a conference on a opinion ask many experts. here there are two main families, which both give very good scores.

Random Forest - take the average of multiple deeply trained decision trees. Typically the first benchmark machine learning model.

Gradient Boosted Decision Trees - Train new decision trees on the residual of the previous. Thereby putting higher emphasis on previously badly modeled observations. Very often the best performing model on the data science competition site kaggle.com.

So my advice would be use Random Forest or a gradient booster and trust the scores. The gradient booster is properly the closes model to your idea.