Solved – Implementing Balanced Random Forest (BRF) in R using RandomForests

Hi I am developing a fraud prediction model. Because this is a highly unbalanced classification problem I have chosen to try to resolve it by Random Forests.

Inspired by this article
http://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf
I have chosen to try Balanced Random Forests.

For now I am not sure how to implement these Forests in R.
The article suggests that: For each iteration in random forest, draw a bootstrap sample from the minority class.
Randomly draw
the same number of cases, with replacement, from the majority class.

Is this achieved by specifying these parameters?

replace = TRUE  
strata = fraud.variable  
sampsize = c(x,x) where x is the size of samples to be drawn

library(ranger) #Best random forest implementation in R #Make a dataste set.seed(43) nrow <- 1000 ncol <- 10 X <- matrix(rnorm(nrow * ncol), ncol=ncol) CF <- rnorm(ncol) Y <- (X %*% CF + rnorm(nrow))[,1] Y <- as.integer(Y > quantile(Y, 0.90)) table(Y) #Compute weights to balance the RF w <- 1/table(Y) w <- w/sum(w) weights <- rep(0, nrow) weights[Y == 0] <- w['0'] weights[Y == 1] <- w['1'] table(weights, Y) #Fit the RF data <- data.frame(Y=factor(ifelse(Y==0, 'no', 'yes')), X) model <- ranger(Y~., data, case.weights=weights) print(model)

Solved – Implementing Balanced Random Forest (BRF) in R using RandomForests

Best Answer

Related Question

Best Answer

Related Solutions

Solved – Predicting the probability of product to be bought

Solved – Facing unbalanced data: AUC vs. Cohen’s Kappa vs. Balanced Misclassification Rate

Related Question