Solved – Is the Standard Deviation of a binomial dataset informative

binomial distributionstandard deviation

I am working on a dataset of presence/absence data, with my response variable being 'proportion of sites where X is present'. I have been asked to provide standard deviations alongside the mean proportions. However, it appears to me that the standard deviation of a binomial dataset is a polynomial function of the proportion itself and does not grant additional information about the variability of the underlying data. For example, if a proportion from data is 0.3, it should not matter if that proportion was derived from presence/absence data from 10, 100, or 100,000 sites, the standard dev should be the same.

When I make a sample dataset and graph mean proportion vs st dev, I can model it with a 6th order polynomial function with an R squared of 1.00.

So, can someone confirm my suspicion- That standard deviations are an inherent property of the proportion in a binomial dataset, and thus yield no additional information about the dataset from which that proportion came?

Best Answer

If you have a binomial random variable $X$, of size $N$, and with success probability $p$, i.e. $X \sim Bin(N;p)$, then the mean of X is $Np$ and its variance is $Np(1-p)$, so as you say the variance is a second degree polynomial in $p$. Note however that the variance is also dependent on $N$ ! The latter is important for estimating $p$:

If you observe 30 successes in 100 then the fraction of successes is 30/100 which is the number of successes divided by the size of the Binomial, i.e. $\frac{X}{N}$.

But if $X$ has mean $Np$, then $\frac{X}{N}$ has a mean equal to the mean of $X$ divided by $N$ because $N$ is a constant. In other words $\frac{X}{N}$ has mean $\frac{Np}{N}=p$. This implies that the fraction of successes observed is an unbiased estimator of the probabiliy $p$.

To compute the variance of the estimator $\frac{X}{N}$, we have to divide the variance of $X$ by $N^2$ (variance of a (variable divided by a constant) is the (variance of the variable) divided by the square of the constant), so the variance of the estimator is $\frac{Np(1-p)}{N^2}=\frac{p(1-p)}{N}$. The standard deviation of the estimator is the square root of the variance so it is $\sqrt{\frac{p(1-p)}{N}}$.

So , if you throw a coin 100 times and you observe 49 heads, then $\frac{49}{100}$ is an estimator of for the probability of tossing head with that coin and the standard deviation of this estimate is $\sqrt{\frac{0.49\times(1-0.49)}{100}}$.

If you toss the coin 1000 times and you observe 490 heads then you estimate the probability of tossing head again at $0.49$ and the standard devtaion at $\sqrt{\frac{0.49\times(1-0.49)}{1000}}$.

Obviously the in the second case the standard deviation is smaller and so the estimator is more precise when you increase the number of tosses.

You can conclude that, for a Binomial random variable, the variance is a quadratic polynomial in p, but it depends also on N and I think that standard deviation does contain information additional to the success probability.

In fact, the Binomial distribution has two parameters and you will always need at least two moments (in this case the mean (=first moment) and the standard deviation (square root of the second moment) ) to fully identify it.

P.S. A somewhat more general development, also for poisson-binomial, can be found in my answer to Estimate accuracy of an estimation on Poisson binomial distribution.

Related Question