Solved – Confidence interval for a weighted mean of proportions

confidence intervalweighted mean

I am doing an experiment in which I measure success or failure of a series of $k$ trials, yielding a proportion $P$. I repeat the experiment several times, yielding a set of $N$ proportions: $P_1, \ldots, Pn$, and would like to compute a mean and confidence interval for these proportions. There are a few things that make this tricky:

  • The experiments are independent, but there might be systematic errors that affect all of the trials within any one experiment (in other words, the trials are pseudo-replicates).

  • The number of trials can vary by as much as a factor of $10$ from experiment to experiment.

  • For reasons related to the design of this specific experiment, I have more confidence in experiments where more trials were conducted.

So far, rather than computing a simple average:

$$P_{est} = \frac1N \sum_{i=1}^N P_i$$

I compute a weighted average proportion:

$$P_{est} = \frac{ \sum_{i=1}^N k_i \cdot P_i}{\sum_{i=1}^N k_i}$$

Which is equivalent to the total number of successes divided by the total number of trials across all experiments.

Now, how do I calculate a confidence interval for this? It is not a simple binomial proportion because of the pseudo-replication issue. Nor is it the simple confidence interval for the mean of experiments, because that doesn't take into account the weighting.

Thanks for any suggestions.

Best Answer

Drawing from survey sampling methods, I might look at estimating the $p_{est}$ as you've already done but then estimate the standard error by treating those trials within one experiment as a cluster. It would be a simple one-stage cluster where your clusters are the individual experiments and they need not be balanced. Once you have your estimate $\hat{p}_{est}$ and your standard error of $\hat{p}_{est}$, you can generate your confidence interval in the usual $\hat{p}_{est}\pm t^*\times SE[\hat{p}_{est}]$. However the calculation of the standard error will be different from your typical calculation, as you lose a bit of precision due to the clusters. The positive side is that you accurately capture those different experiments.

Related Question