MATLAB: What fraction of input data is used for out-of-bag observations when creating a TREEBAGGER object using Statistics Toolbox 7.1 (R2009a)

bagclassificationdecisionofoutregressionStatistics and Machine Learning Toolboxtrees

I have created a TREEBAGGER object setting 'oobvarimp' to 'on'. I want to determine what fraction of observations are used as out-of-bag observations.

Best Answer

For every tree, the bagger randomly selects N*bagger.FBoot out of N observations with replacement (default) for training. Observations that were not selected for training are out-of-bag observations. If bagger.FBoot=1 (default), on an average roughly 2/3 of input data is selected for training for every tree and the remaining 1/3 is used as out-of-bag observations. This number can fluctuate from one tree to another, and out-of-bag observations for one tree are not identical to out-of-bag observations for another tree.
You can use the following code as an example to determine the fraction of out-of-bag observations per tree.
load imports-85;
Y = X(:,1);
X = X(:,2:end);
ntrees = 50;
for j = [0.5 0.8 1]
b = TreeBagger(ntrees,X,Y,'oobvarimp','on','Fboot',j);
[obs vars] = size(b.X);
num_oob_per_tree = sum(sum(b.OOBIndices))/ntrees;
fprintf(['\n\nFor ' num2str(ntrees) ' trees and FBoot = ' num2str(j) ':\n'])
frac_oob_observations = num_oob_per_tree/obs
end
Related Question