Solved – Confidence intervals for proportions (prevalence)

confidence interval

I have some data on a sample of $n=1776$ hospitals. For each hospital there is a total number of patients (patients), and a number of patients diagnosed with a particular condition (diagnosed). Do I take the mean of this proportion,diagnosed/patients, for all hospitals in the sample, $\hat{\mu}$, and calculate a 95% confidence interval as $\hat{\mu} \pm 1.96\sigma / \sqrt{n}$ or as $\hat{\mu} \pm 1.96 \sqrt{\hat{\mu}(1-\hat{\mu})/n}$ ? Or…. ?

Update

[Following comments from whuber]. Additionally, the data are broken down into 2 age groups (young and old) and 3 risk scores. That is, all 1776 hospitals have total numbers of patients as follows:

               younger patients       older patients             

Low risk            A                      D

Medium risk         B                      E

High risk           C                      F

…and similarly for the numbers of patients with the condition.

So, for each combination of age group and risk score, I would like to estimate the mean prevalence and a confidence interval for it.

Here is some summary of the data

Risk   age    mean   sd      n
1      u50    0.37   0.19    1776
2      u50    0.49   0.25    1776
3      u50    0.54   0.26    1776
1      o50    0.45   0.36    1776
2      o50    0.52   0.42    1776
3      o50    0.67   0.41    1776

Best Answer

You could try a nonparametric bootstrap approach. For example

require(boot)
the.means = function(dt, i) {mean(dt[i])}
boot.obj <- boot(data=mydata, statistic=the.means , R=10000) 
quantile(boot.obj$t, c(.025,.975))

You can repeat this for each of your 6 subsets of data.