Let $m\le k\le n$; when sampling an $m$-sized random subset from $\{1,2,\ldots,n\}$, how many will be from $\{1,2,\ldots,k\}$

probabilityprobability distributionsprobability theorysamplingupper-lower-bounds

Let $m\le k\le n$ be integer parameters. Given $z\in\mathbb N^+$, let $[z]$ denote the set $\{1,2,\ldots,z\}$.

Consider selecting a uniformly random $m$-sized subset $X\in {[n]\choose m}$.

Let $Y=X\cap[k]$ be the subset of the sample from $[k]$.

Since each element in $[n]$ is chosen to $X$ with probability $p=m/n$, we have that $\mathbb E[|Y|]=k\cdot p =k\cdot m/n$.

I'm looking for a concentration bound for $|Y|$.

If each element in $[k]$ was chosen i.i.d. with probability $p$, we could use a standard Chernoff bound for that. For example, if we have $Z\sim Bin(k,p)$ then we can get that for any $t>0$:
$\Pr[|Z-kp|\ge t\cdot kp]\le 2e^{-kpt^2/3}$.

How can we get a similar concentration bound for $Y$?


Intuitively, I'd want to say to say that $Y$ is more concentrated around its mean than $Z$, but am not sure how to prove that.

Best Answer

The variable of interest $|Y|$ follows a hypergeometric distribution

Its variance is

$$ \sigma_Y^2= m \frac{k}{n} \frac{n-k}{n} \frac{n-m}{n-1}$$

Comparing this with a Binomial with the same mean and range:

$$ \frac{\sigma_Y^2}{\sigma_B^2}= \frac{\frac{n-k}{n} \frac{n-m}{n-1}}{1-m/n}=\frac{n-k}{n-1} < 1$$ for $k>1$ , just as intuition says.