I think your questions are very interesting, I spend some of my time looking at the effective mapping curvature of random forest(RF) model fits. RF can capture some orders of interactions depending on the situation. x1 * x2 is a two-way interaction and so on... You did not write how many levels your categorical predictors had. It matters a lot. For continous variables(many levels) often no more than multiple local two-way interactions can be captured. The problem is, that the RF model itself only splits and do not transform data. Therefore RF is stuck with local uni-variate splits which is not optimal for captivating interactions. Therefore RF is fairly shallow compared to deep-lerning. In the complete other end of the spectre are binary features. I did not know how deep RF can go, so I ran a grid-search simulation. RF seems to capture up to some 4-8 orders of interactions for binary features. I use 12 binary variables and 100 to 15000 observations. E.g. for the 4th order interaction, the prediction vector y is:
orderOfInteraction = 4
y = factor(apply(X[,1:orderOfInteraction],1,prod))
where any element of X either is -1 or 1 and the product of the first four variable columns of X is the prediction. All four variables are completely complimentary. Therefore, no main-effects, 2nd or 3rd order effects. The OOB prediction error will therefore reflect only how well RF can captivate an interaction of the Nth order.
Things which makes RF captivate higher order of interactions: plenty of observation, few levels in variables, few variables
Limiting factors for RF captivating higher orders: the opposite of above, limited sampsize, limited maxnodes and redundant/sufficient lower order information.
The last one means that if RF can find the same information in low-order interactions, there is, so to say, no need to go deeper. Information may not even be redundant. It just have to be sufficient for RF to make correct binary predictions.
Depth of random forest: OOB err.rate vs. observations vs. order of interaction
rm(list=ls())
library(randomForest)
library(parallel)
library(rgl)
simulate.a.forest = function(std.pars,ite.pars) {
#Merge standard parameters with iterated parameters
run.pars = c(std.pars,ite.pars)
#simulate data of a given order
X = replicate(run.pars$vars,sample(c(-1,1),run.pars$obs,replace=T))
y = factor(apply(X[,1:run.pars$intOrder],1,prod))
#run forest with run.pars, pars with wrong name is ignored
rfo = do.call(randomForest, run.pars)
#Fetch OOB error.rate and return
out = rev(rfo$err.rate[,1])[1] #fetch error rate from object
names(out) = paste(ite.pars,collapse="-")[1]
return(out)
}
## Lets try some situations (you can also pass arguments to randomForest here)
intOrders = c(2,3,4,5,6,12) #hidden signal is a N-way interaction of Nth order
obss = c(100,500,1000,3500,7000,15000) #available observations
## Produce list of all possible combinations of parameters
ite.pars.matrix = expand.grid(intOrder=intOrders,obs=obss)
n.runs = dim(ite.pars.matrix)[1]
ite.pars.list = lapply(1:n.runs, function(i) ite.pars.matrix[i,])
i=1 ##for test-purposes
out = mclapply(1:n.runs, function(i){
#line below can be run alone without mclapply to check for errors before going multicore
out = simulate.a.forest(std.pars=alist(x=X,y=y,
ntree=250,
vars=12),
#sampsize=min(run.pars$obs,2000)),
ite.pars=ite.pars.list[[i]])
return(out)
})
##view grid results
persp3d(x = intOrders,xlab="Nth order interaction",
y = log(obss,base=10),ylab="10log(observations)",
z = matrix(unlist(out),nrow=length(intOrders)),zlab="OOB prediction error, binary target",
col=c("grey","black"),alpha=.2)
rgl.snapshot(filename = "aweSomePlot.png", fmt = "png", top = TRUE)
Best Answer
Basically there are two differences:
Some variants I saw: