Variance of sum of $k$ randomly drawn numbers from $1,…,n$.

expected valueprobabilityrandom variablesvariance

From an urn with numbers $1,…,n$ we draw $k < n$ numbers without replacement.

Let $X_i$ be the $i$-th draw. The random variable is their sum $X=\sum_{i=1}^kX_i$.

I have already calculated the expected value of the sum, which is

$$\Bbb{E}[X]=\sum_{i=1}^k\Bbb{E}[X_i]=k\frac{n+1}{2}$$ because each $\Bbb{E}[X_i]=\frac{1}{n}\sum_{i=1}^n i=\frac{n+1}{2}$.

Now the variance of the sum would be $$Var[X]=\Bbb{E}[X^2]-\Bbb{E}[X]^2$$

I have read that the variance of a sum is the sum of variances if the random variables are independent, it does not seem to be the case here, as previous draws determine future draws.

Is there an elegant way to determine the first summand of the variance?


Edit: I am trying it the ugly way.

$\Bbb{E}[X^2]=\Bbb{E}[(\sum_{i=1}^kX_i)^2]=\Bbb{E}[\sum_{i=1}^k \sum_{j=1}^k X_iX_j]=\sum_{i=1}^k \sum_{j=1}^k \Bbb{E}[X_iX_j]$

To know $\Bbb{E}[X_iX_j]$ we would have to know $\Bbb{P}(X_iX_j=k)$, meaning we would have to know the number of ways to write a number as the product of two factors $1\leq X_i, X_j \leq n$… I am pretty sure I am off the track here, as I do not see a way to do it for a general $n$.


Am I wrong to regard the $X_i$ instead of the $X$, which are independent, as two draws of $k$ balls would be independent? Then $\Bbb{E}[X^2]=\Bbb{E}[X]\Bbb{E}[X]$

Best Answer

Let's do it the ugly way. If any of the steps is confusing, let me know in the comments, I'll elaborate.

You have $$\mathbb{E}[X^2] = \sum_{i=1}^k \sum_{j=1}^k \mathbb{E}[X_iX_j] = \sum_{i=1}^k \mathbb{E}[X_i^2]+2\sum_{1\leq i < j\leq k} \mathbb{E}[X_iX_j]$$

The first term is easy to compute: $$ \sum_{i=1}^k \mathbb{E}[X_i^2] = k\cdot \frac{1}{n}\sum_{i=1}^n i^2 = \frac{k(n+1)(2n+1)}{6}\,. $$ The second... is similar. $$\begin{align*} 2\sum_{1\leq i < j\leq k} \mathbb{E}[X_iX_j] &= \binom{k}{2}\cdot \frac{1}{\binom{n}{2}} \sum_{\substack{1\leq i,j\leq n\\ i\neq j}} ij\\ &= \frac{k(k-1)}{n(n-1)}\left( \sum_{1\leq i,j\leq n} ij-\sum_{1\leq i\leq n} i^2 \right) \tag{Can you see why?}\\ &= \frac{k(k-1)}{n(n-1)}\left( \left(\sum_{i=1}^n i\right)^2-\sum_{i=1}^n i^2 \right) \tag{Can you see why?}\\ &= \frac{k(k-1)}{n(n-1)}\left( \left(\frac{n(n+1)}{2}\right)^2-\frac{n(n+1)(2n+1)}{6} \right) \\ &= \frac{k(k-1)}{n(n-1)}\left( \frac{n(n+1)(3n^2-n-2)}{12} \right) \end{align*}$$ so $$\begin{align} \mathbb{E}[X^2] - \mathbb{E}[X]^2 &= \frac{k(n+1)(2n+1)}{6} + \frac{k(k-1)(n+1)(3n^2-n-2)}{12(n-1)} - \frac{k^2(n+1)^2}{4}\\ &= \boxed{\frac{k(n-k)(n+1)}{12}} \end{align}$$

Sanity checks: the obtained expression is non-negative (good: it's a variance), and equal to $0$ for $k=n$ (good, this makes sense: if we decide to draw all the numbers, the sum is fixed). Moreover, for $k=1$, we do get $(n^2-1)/12$, which is indeed the variance of a uniform r.v. on $\{1,2,\dots,n\}$.