Solved – Increasing sample size of multivariate data with bootstrapping

bootstrappythonrsample-sizesmall-sample

I'm trying to perform variable selection methods on a small data set (approximately 100 observations with 14 predictors each). Is there any way to increase the sample size with bootstrapping? Are there any methods for bootstrapping multivariate data? If so, what are the concerns about bootstrapping multivariate data to increase sample size and how can I generate this data in R or Python?

Best Answer

  1. Is there any way to increase the sample size with bootstrapping? No. The bootstrap is basically a technique to approximate standard errors of complex estimators. It is not a data generating technique.
  2. Are there any methods for bootstrapping multivariate data? Yes. Usually, the data lines are resampled together as a whole (see code below). This strategy may fail e.g. if the order of the lines is relevant to the problem (like in time series analysis).

How to e.g. bootstrap lines of a multivariate data set $X$:

getBootSample <- function(X) {
  X[sample(1:nrow(X), replace = TRUE),]
}

set.seed(5)
head(getBootSample(iris))