Solved – How do Adasyn and SMOTE handle categorical data, specifically binary features

unbalanced-classes

SMOTE oversamples the minority class by creating synthetic data along the line connecting a minority class sample with each (or how many ever are chosen) of its K neighbors. In other words, xnewsample = xoldsample + lambda*(xneigbhor – xoldsample). How should this approach be modified when binary features are present?

Best Answer

Ok, so I found the answer. Just in case some else is interested here it is: The answer lies within the SMOTE paper (https://www.jair.org/media/953/live-953-2037-jair.pdf) itself. The SMOTE-NC technique is presented in section 6.1 of the paper that describes how mixed data types (nominal and continuous) can be handled