Solved – Covariance of order statistics

covarianceorder-statisticsvariance

I'm a researcher in social science and I have encountered the following math formulation of a problem in my field. Note that I'm relatively new to stack exchange and I have already posted this on math.stackexchange as well as mathoverflow, before being asked to post it here. Please let me know if I should delete the double post from the sites (otherwise I will definitely update all sites when I receive an answer). Thank you!

Let $x_1,x_2,…,x_n,x_{n+1}$ be $n+1$ i.i.d. random variable with non-negative support and strictly positive probability mass around zero. Let $$z_k\equiv \min\{x_1,…,x_k\}.$$

In simulations, I find $Cov(z_{n+1},z_n)$ to be very close to $Var(z_{n+1})$ and very different from $Var(z_{n})$, as long as $n>>1$. I have tried this for many distributions (with non-negative support): $Cov(z_{n+1},z_n) / Var(z_{n+1}) \approx 1$ and the approximation gets better as $n$ increases, even for small $n$ such as $n=5$. On the other hand, $Cov(z_{n+1},z_n)/Var(z_n)$ is very far from 1.

How can I formalize this? That is, I'm looking for some kind of bounds on how the approximation improves with $n$. Of course, as $n\to\infty$, $Var(z_n)=Var(z_{n+1})$, so I'm looking for results either for $n$ finite, or an asymptotic result that takes $n$ to $\infty$ on $m\equiv floor(c\cdot n)$ for constant $c>1$ such that $Cov(z_n,z_m)$ is closer to $Var(z_m)$ than $Var(z_n)$.

Note that the approximation only works for X with non-negative support and has a positive probability mass in a neighborhood around zero. I believe that from results in extreme value theory, such distributions have exponential distribution in the limit. I don't know if this is important.

Best Answer

Let $(X_1,\dots,X_n,X_{n+1})$ denote a random sample of size $(n+1)$ drawn on $X$, and let $$Z_n = \min\{X_1,...,X_n\} \quad \text{and} \quad Z_{n+1} = \min\{X_1,...,X_n,X_{n+1}\}$$

By including the extra $X_{n+1}$ term, there are only 2 possibilities:


  • EITHER CASE A $\rightarrow$ with probability $\frac{n}{n+1}$

$\quad \quad \text{The extra term } X_{n+1}$ does NOT change the sample minimum i.e. $z_{n+1} = z_n$. Then:

$$\text{Cov}(Z_n, Z_{n+1})\big|_\text{Case A} \; = \; \text{Cov}(Z_{n+1}, Z_{n+1}) \; = \; \text{Var}(Z_{n+1})$$

Since Event A occurs with probability $\frac{n}{n+1}$, this immediately explains why your observed unconditional covariance $\text{Cov}(Z_n, Z_{n+1})$ is well approximated by $\text{Var}(Z_{n+1})$, as $n$ increases.


  • OR CASE B $\rightarrow$ with probability $\frac{1}{n+1}$

$\quad \quad \text{The extra term } X_{n+1}$ DOES change the sample minimum i.e. $Z_{n+1} < Z_n$. Then $Z_{n+1}$ and $Z_n$ must be the $1^{\text{st}}$ and $2^{\text{nd}}$ order statistics from a sample of size $n+1$ i.e.

$$\text{Cov}(Z_n, Z_{n+1})\big|_\text{Case B} \; = \; \text{Cov}\big(X_{(1)}, X_{(2)}\big) \text{ in a sample of size: } n+1$$


In summary:

\begin{align*}\displaystyle \text{Cov}(Z_n, Z_{n+1}) \; &= \frac{n}{n+1}\text{Cov}(Z_n, Z_{n+1})\big|_\text{Case A} \quad + \quad \frac{1}{n+1}\text{Cov}(Z_n, Z_{n+1})\big|_\text{Case B} \\ &= \frac{n}{n+1} \text{Var}(Z_{n+1}) \quad + \quad \frac{1}{n+1} \text{Cov}\big(X_{(1)}, X_{(2)}\big)_{\text{sample size } = n+1} \\ & \end{align*}

This makes it easy to see why the result is similar to $\text{Var}(Z_{n+1})$: because Case A dominates with probability $\frac{n}{n+1}$


Example and Check: Uniform Parent

In the case of $X \sim \text{Uniform}(0,1)$ parent:

  • Case A: $\text{Var}(Z_{n+1}) = \text{Var}(X_{(1)})_{\text{sample size } = n+1} = \frac{n+1}{(n+2)^2 (n+3)}$

  • Case B: $\text{Cov}\big(X_{(1)}, X_{(2)}\big)_{\text{sample size } = n+1} = \frac{n}{(n+2)^2 (n+3)}$

  • Then: $\text{Cov}(Z_n, Z_{n+1}) = \frac{n}{(n+1) (n+2) (n+3)}$

The following diagram compares:

  • this exact theoretical solution for $\text{Cov}(Z_n, Z_{n+1})$, as $n$ increases from 1 to 30 $\rightarrow$ the red curve

  • to a Monte Carlo calculation of $\text{Cov}(Z_n, Z_{n+1})$ $\rightarrow$ the blue dots

enter image description here

Looks fine.


The following diagram compares the exact theoretical solution for $\text{Cov}(Z_n, Z_{n+1})$, $\text{Var}(Z_n)$ and $\text{Var}(Z_{n+1})$: as the OP reports, by the time $n = 5$, $\text{Cov}(Z_n, Z_{n+1})$ is well approximated by $\text{Var}(Z_{n+1})$:

enter image description here

Related Question