Solved – Implementing Balanced Random Forest (BRF) in R using RandomForests

machine learningrrandom forest

Hi I am developing a fraud prediction model. Because this is a highly unbalanced classification problem I have chosen to try to resolve it by Random Forests.

Inspired by this article
http://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf
I have chosen to try Balanced Random Forests.

For now I am not sure how to implement these Forests in R.
The article suggests that: For each iteration in random forest, draw a bootstrap sample from the minority class.
Randomly draw
the same number of cases, with replacement, from the majority class.

Is this achieved by specifying these parameters?

replace = TRUE  
strata = fraud.variable  
sampsize = c(x,x) where x is the size of samples to be drawn

Best Answer

You can balance your random forests using case weights. Here's a simple example:

library(ranger) #Best random forest implementation in R

#Make a dataste
set.seed(43)
nrow <- 1000
ncol <- 10
X <- matrix(rnorm(nrow * ncol), ncol=ncol)
CF <- rnorm(ncol)
Y <- (X %*% CF + rnorm(nrow))[,1]
Y <- as.integer(Y > quantile(Y, 0.90))
table(Y)

#Compute weights to balance the RF
w <- 1/table(Y)
w <- w/sum(w)
weights <- rep(0, nrow)
weights[Y == 0] <- w['0']
weights[Y == 1] <- w['1']
table(weights, Y)

#Fit the RF
data <- data.frame(Y=factor(ifelse(Y==0, 'no', 'yes')), X)
model <- ranger(Y~., data, case.weights=weights)
print(model)
Related Question