Bootstrapping seeks to simulate the effect of drawing a new sample from the population, and doesn't seek to ensure distinct test sets (residues after N from N sampling with replacement).
RxK-fold Cross-validation ensures K distinct test folds but is then repeated R times for different random partitionings to allow independence assumptions to hold for K-CV, but this is lost with repetition.
Stratified Cross-validation violates the principal that the test labels should never have been looked at before the statistics are calculated, but this is generally thought to be innocuous as the only effect is to balance the folds, but it does lead to loss of diversity (an unwanted loss of variance). It moves even further from the Boostrap idea of constructing a sample similar to what you'd draw naturally from the whole population. Arguably the main reason stratification is important is to address defects in the classification algorithms, as they are too easily biased by over- or under-representation of classes. An algorithm that uses balancing techniques (either by selection or weighting) or optimizes a chance-correct measure (Kappa or preferably Informedness) is less impacted by this, although even such algorithms can't learn or test a class that isn't there.
Forcing each fold to have at least m instances of each class, for some small m, is an alternative to stratification that works for both Bootstrapping and CV. It does have a smoothing bias, making folds tend to be more balanced than they would otherwise be expected to be.
Re ensembles and diversity: If classifiers learned on the training folds are used for fusion not just estimation of generalization error, the increasing rigidity of CV, stratified Bootstrap and stratified CV leads to loss of diversity, and potentially resilience, compared to Bootstrap, forced Bootstrap and forced CV.
How you control the size of tree is by the interaction.depth and minoside parameter for training a gbm model using caret. I assume there are corresponding parameters in other decision tree modeling package as well. As the interaction.depth controls the maximum times a split can stack upon another in a single decision tree, minoside controls how many terminal regions(therefore how many splits a tree form).
How you perform grid search in caret is first you construct a grid that have all combination of parameters you want to search. And in trainControl you would specify cross-validation folds and repeats to perform on EACH parameter combination.
So I'm not sure what do you mean by "cp control parameter"? If you have concern with unbalanced dataset that's another issue.
Best Answer
The quote is quite generic. It usually applies to scenarios where we have class imbalance and "simple $k$-fold" CV might result in training subsamples where we have no instances (or insufficient instances) of the minority class. In these cases our learning procedure might indeed be presented with an unrepresentative subsample to train with. Note that even in unbalanced samples this will be less of a problem as the overall sample size grows. There is no "hard-rule" on whether we should follow; we might want to ensure that our both training and test samples have instances of both classes but that is not too hard to ensure. Choosing stratified CV or simple CV is usually done when we have indication that our learning procedure is biased due to class imbalance; a usual metric that this problem might be reflect is having very low recall values but that is just one facet.
To give some concrete numbers on my point about sample size: let's assume that our minority class examples represent $1\%$ of our dataset and we use simple $5$-fold simple CV. Then approximately $(1-\frac{1}{5})^{\frac{1}{100}N}$ of our test subsamples would contain no instances of the minority class. i.e. even with $2100$ samples less that $1\%$ of our test subsamples would have no minority class instances. Similarly approximately $(1 - \frac{4}{5})^{\frac{1}{100}N}$ of our training subsamples will contain no instances of the minority class. i.e. even with "just" $300$ samples less that $1\%$ of our train subsamples would have no minority class instances. Clearly, having just one representative from a given class is not very helpful but it is evident that especially when training our classifier we quickly avoid totally unrepresentative samples. I would recommend reading the CV.SE threads: "Why use stratified cross validation? Why does this not damage variance related benefit?" and When is unbalanced data really a problem in Machine Learning?; both provide some further context on the use of stratified $k$-fold CV and imbalanced learning in particular.
I append a small R script that might help building intuition further.