Solved – Stuck on a Monte Carlo Simulation of Confidence Intervals

confidence intervalr

I'm trying to run a Monte Carlo simulation to check the coverage of Wald confidence interval, with n=40, p=0.2, and nominal confidence levels of 95%, with 10,000 simulation runs. I need to calculate the empirical coverage.
I am not sure what I actually need to do to answer. I think I have the majority of it done, but I'm not sure on how to complete it.

n <- 40
X <- rbinom(10000, n, 0.2)
pHat <- X/n
a <- 0.05
q <- 1-pHat
k <- qnorm(1-a/2)
#wald CI
waldU <- (X/n)+k*n^(-1/2)*(pHat*q)^(1/2)
waldL <- (X/n)-k*n^(-1/2)*(pHat*q)^(1/2)

After running this I have 10,000 upper limits and 10,000 lower limits, should my next step be to average each one?

Best Answer

As noted by ocram, next you need to calculate the proportion of intervals containing the true value

mean(waldL <= p & p <= waldU)

where p is the true probability of success. Notice that the value of n (in rbinom(nsamples, n, p)) limits the number of distinct pHat values to the set (1:n)/n, so with n equal to 40 you should not expect exact $100\alpha\%$ coverage, but it should get closer as n grows.


A coding side-note: don't use x^(1/2) for square root, use sqrt instead. In all (or at least majority) of the programming languages sqrt is more computationally efficient then ^(1/2), for example a quick benchmark in R shows that it is approximately six times faster:

     test replications elapsed relative user.self sys.self user.child sys.child
1 sqrt(x)          500    0.48    1.000      0.37     0.08         NA        NA
2 x^(1/2)          500    2.87    5.979      2.72     0.12         NA        NA