Here's a toy data set that replicates my problem. I am interested in knowing the confidence intervals of an empirical distribution that is composed of the scores of each school at the proportion that student "A".
set.seed(1)
rows = 50
df <- data.frame(student = sample(LETTERS[1:3],rows,rep=T),
school = sample(c("F","G"),rows,rep=T),
score = sample(1:10,rows,rep=T,prob = c(rep(0.05,7),rep(0.2167,3)))
)
head(df)
student school score
1 A F 3
2 B G 9
3 B F 9
4 C F 1
5 A F 10
6 C F 8
>
In this example: student "A" has 3 scores from school "G" and 9 scores from school "F":
> df[df$student=="A",]
student school score
1 A F 3
5 A F 10
10 A F 10
11 A G 1
12 A F 6
22 A G 10
24 A F 8
25 A F 7
27 A G 10
34 A F 10
38 A F 10
47 A F 8
How do I generate bootstrap samples that would sample 12 scores at the correct proportion of student "A" school. I need to calculate the CI of the expected score of the average student scoring student "A"'s school proportions.
I look through the "boot" package boot
function help. There is an example of stratified bootstrap but I don't get what stype
is doing. I understand stype="i"
but I don't understand what happens with stype="w"
or "f"
and how to use them.
Best Answer
stype applies when you have to calculate a weighted statistic that is based on frequency or weight. In your case, i don't think it applies.
Most likely you need to split the data.frame by student first, and apply boot on each group, this ensures you get the same number of observations per A/B/C group. Inside each group, you apply the strata to get the same proportions. Below I apply a function to get the mean:
Then to get c.i :