Solved – Analytic or sample standard deviation with binomial data

binomial distributionstandard deviationstandard errorvariance

I've been looking for recommendations on whether it's better to use the sample standard deviation (SD) for a binomial distribution or use the analytic SD (or variance). It's for experiments with accuracy data so each subject contributes several binomial (correct or incorrect) responses resulting in an accuracy score between 0 an 1 (Ncorrect / Ntotal). I get the mean across subjects. Assuming each one contributes the same amount should I used the sample SD of the individual accuracy scores or use the analytic equation (sqrt(pq / n) where n is the number of responses / subject)?

This is a query for descriptive purposes only. I could use a multi-level logistic regression to model but I'm just looking for simple descriptives.

Example: Each subject gets 20 tries at the task and there are 6 subjects with accuracy 0.95, 0.80, 0.80, 0.65, 0.90, 0.70. The mean accuracy would be 0.8 and therefore the analytic SD of that accuracy is sqrt(0.8 * 0.2 / 20) = 0.089. However, the SD of those six numbers calculated on the sample is 0.114. Which is the better SD estimate to use?

Best Answer

"Better" depends on context and purpose. Before addressing this issue, though, let's consider the data.

As a point of departure we might assume--hypothetically, being willing and happy to be proven wrong later in the analysis--that the outcomes of each subject's attempt at the task are independent. This permits us to hold up a simple model for scrutiny, one in which each subject $i$ has a constant chance $p_i$ of success with each attempt. It follows that the raw counts of successes $(x_i,\ i=1, 2, \ldots, n)$ consist of six (or, more generally, $n$) independent realizations of Binomial$(m, p_i)$ variables $X_i$ (with $m=20$ in this case). In this case the raw counts are $(19,16,16,13,18,14)$, obtained by multiplying the reported success rates by $20$.

This is a complicated model because it has as many parameters ($n$ of them) as there are data. To see whether the complication is worthwhile, we ought to compare this model to a simplified version. The simplest is that all the $p_i$ are equal: the subjects have equivalent abilities at the task. Is there a small set of simple, easily understood, summary statistics that might help give us some quick insight into which model would be appropriate?

In the spirit of an Analysis of Variance we might be inclined to compare the variance of the dataset--which will comprise the variances inherent in each of the $X_i$ together with the variance of the $p_i$--to some measure of the variance to be expected when all the $p_i$ are equal. Therefore we compute:

  1. The mean of the $p_i$ is $p = (1/n)\sum_{i=1}^n p_i.$ There are several ways to estimate this, but one of the simplest--as justified by the hypothesis that all the $p_i$ are equal--is the sample mean,

    $$\hat{p} = \frac{1}{n}\sum_{i=1}^n \frac{x_i}{m} = \frac{4}{5} = 0.8.$$

  2. The variance of each $X_i$ is $m p_i(1-p_i)$; under the hypothesis of equality, this is $m p(1-p)$, which can be estimated as

    $$\hat{\sigma} = m \hat{p} (1 - \hat{p}) = \frac{16}{5} = 3.2.$$

  3. The variance of the data is

    $$\text{Var}(x_i) = \frac{1}{n}\sum_{i=1}^n (x_i - m \hat{p})^2 = \frac{13}{3} = 4.\bar{3}.$$

Please notice that, in the spirit of description and exploration, division by $n-1$ in this variance calculation could be considered irrelevant. However, should one feel a need to so change the denominator, the result would be $26/5 = 5.2$.

The statistic (3) can be understood as arising from two components: the variation in subject performances due to chance plus the variation in capabilities between the subjects. That is why the two standard deviations computed in the question (which are the square roots of (2) and (3)) may differ. It becomes clear that they work together to give two separate pieces of information about the data.

It is attractive to take one more step. ANOVA teaches us that the relevant statistic to examine would be the ratio

$$\text{Var}(x_i) / \hat{\sigma}.$$

A value much greater than $1$ would indicate the $p_i$ should be treated as non-constant. In the present case--using the alternative expression for the variance employed in the question--this ratio equals $(26/5)/(16/5) = 13/8 = 1.625.$ This is precisely the square of the ratio of standard deviations reported in the question, $(0.1140/0.08944)^2$.


This analysis has provided a perspective in which the distinction between the two standard deviation calculations in the question can be both understood and used to gain insight into the data-generation mechanism:

  1. The two calculations differ due to possible fluctuations in the subjects' capabilities.

  2. Their ratio (when squared) can be interpreted as an ANOVA F-statistic, permitting its use in evaluating whether the apparent fluctuations may be due to chance or should be accepted as real.

To answer the question, then, one might wish to report both standard deviation calculations together with the F-like statistic given by the square of their ratio.

Related Question