Standard Error – Calculating Proportion Error with Small Sample Sizes

proportion;small-samplestandard error

Standard error for a proportion, when n > 5 and np > 5 is calculated as $$SE = \sqrt{ \frac{p(1-p)}{n}}$$ where p is the proportion and n is sample size.

However, for even smaller samples, we were given an equation $$SE=\frac{1-e^{\frac{log(0.05)}{n}}}{1.96}$$

Is the second formula a valid way to calculate standard error of a proportion for very small samples? What is the rationale behind it?

Best Answer

No.

The standard error of the mean (SEM or standard error) is a value representing how close the 'true' population mean is likely to be to the sample mean. It is related to the standard deviation (SD) - it is the SD divided by the square root of sample size - and it can be used along with the sample mean to derive a 95% confidence interval for the population mean (the 95% confidence interval for the population mean is the sample mean +/- 1.96*SEM).

For 'large' samples (two rules of thumb in common use are when np and n(1-p) are both greater than 5 or when they are both greater than 10) we can use a normal approximation. However, for small samples or extreme proportions, where the rule of thumb is not true, the interval within which the population mean is likely to be found is not symmetrical about the sample proportion. This means that the SD and the SEM cannot be usefully defined. The best alternative is to calculate and report a confidence interval for the population mean; this page includes a calculator for this using the Clopper-Pearson method.

See also the 2001 paper by Brown, Cai and DasGupta, "Interval Estimation for a Binomial Proportion". Statistical Science. 16 (2): 101–133 which is a fairly accessible overview of the problem and some approaches to fixing it.

(It is worth noting that the above paper actually recommends either the Wilson interval, the Agresti interval or the Jeffries interval, but I couldn't find an online calculator for those; there are many in software packages. For example, in Matlab [p, ci] = binofit([positives], [sample size]) gives you a confidence interval via the Agresti method).

Related Question