MATLAB: How to get probabilities of each class which is classified with RUSBoost for an imbalanced data set

fitcensemblerusboostscores

I have a dataset with 7 classes and 3 features. The data set is hugely imbalanced. So, I referred https://www.mathworks.com/help/stats/classification-with-imbalanced-data.html to classfy the data. I get a prediction accuracy of 94%. But I need the probability of getting each class for a feature or set of features. How to get probability of each class to a given feature?
[Nt Mt] = size(y); % Number of observations in the training sample
t = templateTree('MaxNumSplits',Nt);
rusTree = fitcensemble(X,y,'Method','RUSBoost', 'NumLearningCycles',1000,'Learners',t,'LearnRate',0.1,'nprint',100);
[~,scores] = predict(rusTree,[1 16 3 5])
I get following scores for above code, 0.7345, 3.5105, 1.1893, 0, 0, 0, 0.0082
But above scores are not probablities, how to get values between 0-1 where sum of proabilities in all classes is equal to 1?

Best Answer

Hi,
The reason behind predict not returning scores as probability estimates is because the RUSBoost algorithm used in the model does not treat scores as probabilistic estimates. Instead, the score represents the confidence of a classification into a class, higher, being more confidence as it is explained in the documentation link of fitcensemble .
If you would like to get probabilistic estimate for scores you can set the 'ScoreTransform' to 'logit' in 'fitcensemble'. This name-value pair transforms the score to probabilistic estimates. This is explained here. Then using predict on the model returns scores as probability values for each class.
Related Question