Understanding Why the Standard Error of Sample Proportions is Not Divided by n-1

descriptive statisticsmathematical-statistics

There is much discussion on this site why the sum of squared deviations is divided by $n – 1$ when computing either the sample variance or the sample standard deviation.

One of the explanations that I heard for this is that we 'used' some of the data to estimate the population mean with the sample mean. Therefore, that cost 1 degree of freedom.

Can someone please explain, then, why, when calculating the standard error of the sample proportions (as part of the z-test for 1 population proportion) we divide by $n$ and not $n – 1$. The first step in that process is to compute $\widehat{p}$. Then, the standard error of the sample proportions is 'built' on the foundation of that first step. I would have thought that estimating the population proportion with the sample proportion would have cost us 1 degree of freedom. I know that when the assumptions are met, the sample proportions are distributed normal (whose shape is not governed by degrees of freedom), but I am left wondering why this calculation (and quite frankly, the calculation for the standard error of the sample mean) is not analogous to what we did for the sample standard deviation. Why are both of these standard errors calculated by dividing by $n$ and not $n – 1$?

Best Answer

Each observation in a test for proportion difference has either a 0 or 1 value. These values are assumed to be independent of one another (e.g., the first observation recording a '1' has no bearing on whether the second observation—or any other observation—will or will not record a '1' also).

These two facts mean that the outcome variable in a test for proportion difference is distributed Bernouli, with probability $p$. Unlike the normal distribution, in which the mean ($\mu_{x}$), and standard deviation ($\sigma_{x}$) are independent of one another—for example, normally distributed data can have a mean of 5, but a standard deviation of 2, 20, 200,000, or any (non-negative) value, and vice versa—in a Bernouli distribution the standard deviation, and therefore the standard deviation of sample proportions of size $n$, is purely a function of the mean: $\sigma_{x} = \sqrt{p(1-p)}$ and $\sigma_{\widehat{p}} = \sqrt{\frac{p(1-p)}{n}}$, where the mean of Bernouli data is 'dressed up' with the symbol '$p$'. Since the proportion has already been calculated, you get the standard deviation of the sample proportion 'for free'.

Related Question