SMOTE is a popular method to generate synthetic examples of the minority class in an unbalanced-class data set.
I am trying out SMOTE in the "unbalanced" package in R. I am generating a simple simulate data but SMOTE seems to fail on it. Not sure what the problem is.
library(unbalanced)
set.seed(1)
X <- matrix(rnorm(1000), ncol = 2)
X[1:50,] <- X[1:50,]+5
Y <- as.factor(c(rep(1,50), rep(0,450)))
smoted <- ubSMOTE(X,Y,k=1)
#WARNING: NAs generated by SMOTE removed
dim(smoted$X)
#[1] 50 2
I would expect the smoted to be a larger data set that consists of the original data plus the smoted examples. Using other values of k or perc.over does not make a difference.
EDIT
When using the SMOTE function in the DMwR library I get the expected result. So the problem seems to be in the unbalanced library.
library(DMwR)
df <- data.frame(X,y=Y)
smoted2 <- SMOTE(y~.,data=df,k=1)
dim(smoted2)
#[1] 350 3
table(smoted2$y)
# 0 1
#200 150
Best Answer
I know it's a very old question, but I just ran into it.
the problems is the following:
unbalanced::ubSMOTE()
will behave the same asDMwR::SMOTE()
if itsX
input is given asdata.frame()
. if you modifying your code in the following way, the two functions will produce the same result.