Probability – Understanding the Sampling Distribution of Sample Mean

asymptoticscentral limit theoremprobabilitysampling-theorystatistics

We are sampling $Y_1,\dots,Y_n$ without replacement from the population of size N with $Y\sim$ Bernuolli($\mu$). So what will the asymptotic distribution of $\bar(Y_n)$ would be?

Since CLT only applies to indepedent $Y_i$'s but clearly our $Y_i$'s are not independent. So I don't know how to get the whole distribution

Best Answer

If the population has $\mu N$ values of $1$ and $(1-\mu)N$ values of $0$ then the distribution of $\sum\limits_1^n Y_i$ is a hypergeometric distribution with mean $n\mu$ and variance $n\mu(1-\mu)\frac{N-n}{N-1}$, and the distribution of $\bar Y_n = \frac1n \sum\limits_1^n Y_i$ is a scaled version of this so with mean $\mu$ and variance $\frac1n \mu(1-\mu)\frac{N-n}{N-1}$

If the population has a binomial distribution for the number of $1$s with parameters $N$ and $\mu$ with the rest being $0$s then the distribution of $\sum\limits_1^n Y_i$ is a binomial distribution with parameters $n$ and $\mu$ with mean $n\mu$ and variance $n\mu(1-\mu)$, and the distribution of $\bar Y_n = \frac1n \sum\limits_1^n Y_i$ is a scaled version of this so with mean $n\mu$ and variance $\frac1n \mu(1-\mu)$

You ask for an asymptotic distribution, but that depends on what limits you are taking. If it is as $n \to \infty$ and $N \to \frac{n}k$ for some $k$ with $0 < k < 1$ then in both cases I would expect a CLT result, so with suitable adjustment to location and scale towards a normal distribution:

  • so in the hypergeometric case $\frac{1}{\sqrt{n}}\left(\bar Y_n - \mu\right)$ will converge in distribution towards $\mathcal N\left(0, \mu(1-\mu)(1-k)\right)$

  • while in the binomial case $\frac{1}{\sqrt{n}}\left(\bar Y_n - \mu\right)$ will converge in distribution towards $\mathcal N\left(0, \mu(1-\mu)\right)$

Related Question