Solved – Reduce Random Forest model memory size

I've created a regression model on my data using random forests in R. The output is quite large, I'm wondering if there's any way to reduce this to only the necessary pieces to make a prediction?

The training data set contains 20 variables and ~45,000 rows, which is also large. My code is listed below.

data <- readRDS("data.Rds")

require("data.table")
require("doParallel")
require("randomForest")

train <- data[ which(set == "train")]
test <- data[ which(set == "test")]
rm(data)

x <- data.table(train[, 2:21, with=FALSE])
y <- as.vector(as.matrix(train[, 23, with=FALSE]))

cl <- makeCluster(detectCores())
registerDoParallel(cl, cores=4)
time <- system.time({rf.fit <- foreach(ntree=rep(500, 6),
                               .combine=combine,
                               .multicombine=TRUE,
                               .packages="randomForest") %dopar% 
                   {randomForest(x, y, ntree=ntree)}})
stopCluster(cl)

saveRDS(rf.fit, "rf.fit.Rds")

The output of this is ~230 MB. Once I have the model, is it possible to reduce the size to make it easier to work with? My goals with this are to identify the important variables, and make a prediction on new data.

## Clean Model to Save Memory ## http://stats.stackexchange.com/questions/102667/reduce-random-forest-model-memory-size stripRF <- function(cm) { cm$finalModel$predicted <- NULL cm$finalModel$oob.times <- NULL cm$finalModel$y <- NULL cm$finalModel$votes <- NULL cm$control$indexOut <- NULL cm$control$index <- NULL cm$trainingData <- NULL attr(cm$terms,".Environment") <- c() attr(cm$formula,".Environment") <- c() cm }

Solved – Reduce Random Forest model memory size

Best Answer

Related Question

Best Answer

Related Solutions

Solved – How to reduce error rate of Random Forest in R

Solved – Random Forest Classifer – Performance Evaluation

Related Question