Solved – Bootstrapping in R using the boot {boot} and Boot {car}

bootstraprresampling

I'm trying my hand at resampling techniques with a dataset I have, and I think either I'm missing a conceptual point with bootstrapping, or I'm doing something incorrectly in R. Basically, I'm trying to use it in a correlation/regression framework, and I'm able to get the original coefficients, the bootstrap bias, and the bootstrap coefficients but I can't find a way to have R easily display the bootstrap model $R^2$ (when I'm working with several predictors), the Pearson $r$, or the $p$-values for individual regression coefficients. (I'm using the Boot function in the car package).

A secondary question…the more general function boot in the boot package requires defining a function to use as an argument. The function must include an argument for the original data set, and a second argument which is a set of indices, frequencies, or weights for the bootstrap sample. I'm a little confused by this. What conceptually are these indices I am specifying, and how do I specify them syntactically within my function?

Best Answer

The exact issue with your first question isn't really clear to me (perhaps a small reproducible example would help?), but the second question I can explain:

The set of indices is what the bootstrap function passes to your function to say 'use these observations'.

e.g. let's take a super-simple example. Say I was just calculating a mean.

Here's my sample:

Index    value
  1      13.98
  2      14.29
  3      16.91
  4      11.23
  5      16.64
  6      15.96

So the first time through, the bootstrap routine samples the numbers 1 to 6 with replacement, as if it had done this:

> sample(6,replace=TRUE)
[1] 1 6 3 2 3 6

So it tells me those numbers, so that I know to use this as my first bootstrap pseudo-sample:

Index    value
  1      13.98
  6      15.96
  3      16.91
  2      14.29
  3      16.91
  6      15.96

and my first bootstrap statistic would just be the mean of that pseudo-sample.

Then it passes me another set of indices as if it had done sample again:

> sample(6,replace=TRUE)
[1] 5 3 2 1 6 5

So that I know to use this as my sample:

Index    value
  5      16.64
  3      16.91
  2      14.29
  1      13.98
  6      15.96
  5      16.64

and my second bootstrap statistic would just be the mean of that pseudo-sample, and so forth.

All you should need to do in practice is in your function use the index to select the appropriate rows:

mydataframe[index,]

or if there's only a single column, as here, you may want to use drop=FALSE

> mydataframe
      x
1 13.98
2 14.29
3 16.91
4 11.23
5 16.64
6 15.96
> index <- sample(6,replace=TRUE)
> mydataframe[index,,drop=FALSE]
        x
6   15.96
4   11.23
2   14.29
1   13.98
1.1 13.98
3   16.91
Related Question