I have some data on a sample of $n=1776$ hospitals. For each hospital there is a total number of patients (patients
), and a number of patients diagnosed with a particular condition (diagnosed
). Do I take the mean of this proportion,diagnosed
/patients
, for all hospitals in the sample, $\hat{\mu}$, and calculate a 95% confidence interval as $\hat{\mu} \pm 1.96\sigma / \sqrt{n}$ or as $\hat{\mu} \pm 1.96 \sqrt{\hat{\mu}(1-\hat{\mu})/n}$ ? Or…. ?
Update
[Following comments from whuber]. Additionally, the data are broken down into 2 age groups (young and old) and 3 risk scores. That is, all 1776 hospitals have total numbers of patients as follows:
younger patients older patients
Low risk A D
Medium risk B E
High risk C F
…and similarly for the numbers of patients with the condition.
So, for each combination of age group and risk score, I would like to estimate the mean prevalence and a confidence interval for it.
Here is some summary of the data
Risk age mean sd n
1 u50 0.37 0.19 1776
2 u50 0.49 0.25 1776
3 u50 0.54 0.26 1776
1 o50 0.45 0.36 1776
2 o50 0.52 0.42 1776
3 o50 0.67 0.41 1776
Best Answer
You could try a nonparametric bootstrap approach. For example
You can repeat this for each of your 6 subsets of data.