Solved – How to estimate the true distribution by using bootstrap method

bootstrapestimation

I would like to estimate the true distribution of the following data set by making use of bootstrap method.

age = (21,81,85,27,39,61,15,20,39,40,87,87,69,59,54,71,66,88,1,2)

The population size is 20.

Can you please help me with both theoretical information and as well as the R programming code.

Best Answer

Bootstrap won't give you the "true" distribution of you variable of interest, but rather an approximation that might be helpful in estimating parameters of the true distribution.

The idea is very simple: you sample with replacement $N$ cases from your dataset of $N$ observations the same way as you sampled your data from the population. In R that would look like this:

age <- c(21,81,85,27,39,61,15,20,39,40,87,87,69,59,54,71,66,88,1,2)
N <- 20

age_boot <- matrix(NA, 100, 20)
for (i in 1:100) {
  age_boot[i, ] <- sample(age, N, replace=TRUE)
}

or simpler but more "hacky" way:

age_boot <- replicate(100, sample(age, N, replace=TRUE))

By using empirical estimates on bootstrap samples you can obtain parameters of the distribution of your variable (e.g. mean, mode, variance).

As about references, check original paper by Efron (1979) and the two books referenced here. You can find further description of bootstrap in this thread: Explaining to laypeople why bootstrapping works