The answer depends on how you define a "fair" classifier. If the ultimate goal of your analysis is to minimize the overall classification error and if the class proportions in the training set are representative of the real world, you get an optimal classifier from your imbalanced data. If the class proportions in the training set are not what you normally expect or if you want to assign different costs for misclassification of the majority and minority classes, you would need to adjust your learning method accordingly.
In general, there are 4 ways of dealing with skewed data:
1. Adjusting class prior probabilities to reflect realistic proportions.
2. Adjusting misclassification costs to represent realistic penalties.
3. Oversampling the minority class.
4. Undersampling the majority class.
For binary classification, strategies 1 and 2 are equivalent.
If you use fitensemble or TreeBagger, the easiest thing would be to set 'prior' to 'uniform' for an equal mix or to whatever you like.
If you like oversampling or undersampling, nothing in official MATLAB is available out of the box. It wouldn't be too hard to code though.
For undersampling the majority class, personally I had good experience with RUSBoost:
Seiffert, C., Khoshgoftaar, T., Hulse, J.V., and Napolitano, A. (2008) Rusboost: Improving classification performance when training data is skewed, in International Conference on Pattern Recognition, pp. 1โ4.
For oversampling the minority class, a popular method is SMOTE. You might want to look into its boosting extension.
Chawla, N., Bowyer, K., Hall, L., and Kegelmeyer, W. (2002) Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321โ357.
Chawla, N., Lazarevic, A., Hall, L., and Bowyer, K. (2003) Smoteboost: improving prediction of the minority class in boosting, in VIIth European Conference on Principles and Practice of Knowledge Discovery in Databases(PKDD ยด 03), Lecture Notes on Computer Science, vol. 2838, Springer-Verlag, Lecture Notes on Computer Science, vol. 2838, pp. 107โ119.
Best Answer