Joint density of the order statistics for random vectors

combinatoricsorder-statisticsprobabilityprobability distributions

Consider $n$ independent and identically distributed (i.i.d.) random variables $X_1, \ldots,X_n$, with (absolutely) continuous distribution $F$, whose (Lebesgue) density is denoted by $f$. Denote by $X_{(1)}<X_{(2)}<\ldots < X_{(n)}$ the order statistics of the sample. Then, it is well known that the joint probability density of $(X_{(1)},X_{(2)},\ldots , X_{(n)})$ is
$$
n!\prod_{i=1}^n f(x_i), \quad x_1<x_2<\ldots<x_n.
$$

Now, consider $n$ i.i.d. random vectors $(X_1,Y_1), \ldots,(X_n,Y_n)$ with common (absolutely) continuous distribution $G$, whose bivariate (Lebesgue) density is denoted by $g$. Let $(X_{(1)},Y_{(1)}), \ldots,(X_{(n)},Y_{(n)})$ correspond to a reordering of $(X_i,Y_i)$, $1 \leq i \leq n$, based only on the order statistics of the first component, i.e.
$$
X_{(1)}<X_{(2)}<\ldots<X_{(n)}
$$

but, for $i <j$, not necessarily $Y_{(i)}<Y_{(j)}$, more precisely:

Given a realization $(x_1,y_1),\ldots,(x_n,y_n)$, if for some $l \in \{0,1,\ldots,n-1\}$ the realization $x_i$ corresponds to the $l+1$-th largest value among those observed for the first component, then we simply have $(x_{(n-l)},y_{(n-l)})=(x_i,y_i)$.

QUESTION Which is the joint density of $(X_{(1)},Y_{(1)}), \ldots,(X_{(n)},Y_{(n)})$?

Best Answer

The answer is

$$ n!\prod_{i=1}^n g(x_i,y_i)\mathbf{1}(x_1<x_2<\ldots<x_n). $$

To see this, first observe that $$ \mathbb{P}(X_{(i)}\leq x_i, Y_{(i)}\leq y_i, 1 \leq i \leq n)=n!\mathbb{P}(X_1<X_2<\ldots<X_n \text{ and } X_i \leq x_i, Y_i \leq y_i, 1 \leq i \leq n). $$ The term on the right hand side can be written as $$ \int_{-\infty}^{x_n}\int_{-\infty}^{y_n} \left\lbrace \ldots \left[ \int_{-\infty}^{\tilde{x}_3 \wedge x_2}\int_{-\infty}^{y_2}G(x_1 \wedge \tilde{x}_2,v_1)g(\tilde{x}_2,\tilde{y}_2)d\tilde{x}_2d\tilde{y}_2 \right] \ldots\right\rbrace g(\tilde{x}_n, \tilde{y}_n)d\tilde{x}_n d \tilde{y}_n\\ =\int_{-\infty}^{x_n}\int_{-\infty}^{y_n} \left\lbrace \ldots \left[ \int_{x_1}^{\tilde{x}_3 \wedge x_2}\int_{-\infty}^{y_2}G(x_1,v_1)g(\tilde{x}_2,\tilde{y}_2)d\tilde{x}_2d\tilde{y}_2 \right] \ldots\right\rbrace g(\tilde{x}_n, \tilde{y}_n)d\tilde{x}_n d \tilde{y}_n +R_k(\mathbf{x}_{-2}, \mathbf{y}) $$ where $R_k(\mathbf{x}_{-2}, \mathbf{y})$ is a reminder term depending only on $\mathbf{x}_{-2}=(x_1, x_3, \ldots,x_n)$ and $\mathbf{y}=(y_1, \ldots,y_n)$. The term on the left-hand side can be further re-expressed as $$ \int_{-\infty}^{x_n}\int_{-\infty}^{y_n} \left\lbrace \ldots \left[ \int_{x_2}^{\tilde{x}_4 \wedge x_3}\int_{-\infty}^{y_3}G(x_1,v_1)G(x_2,v_2)g(\tilde{x}_3,\tilde{y}_3)d\tilde{x}_3d\tilde{y}_3 \right] \ldots\right\rbrace g(\tilde{x}_n, \tilde{y}_n)d\tilde{x}_n d \tilde{y}_n +R_k'(\mathbf{x}_{-2}, \mathbf{y})+R_k''(\mathbf{x}_{-3}, \mathbf{y}) $$ where the reminder terms $R_k'(\mathbf{x}_{-2}, \mathbf{y})$ and $R_k''(\mathbf{x}_{-3}, \mathbf{y})$ do not depend on $x_2$ and $x_3$, respectively. Iterating the procedure, we finally obtain that $$ n!\mathbb{P}(X_1<X_2<\ldots<X_n \text{ and } X_i \leq x_i, Y_i \leq y_i, 1 \leq i \leq n)=n! \prod_{i=1}^nG(x_i,y_i) + R_n'''(\mathbf{x}_{-n},\mathbf{y}) $$ where the reminder term $ R_n'''(\mathbf{x}_{-n},\mathbf{y})$ does not depend on $x_n$ and accounts for all the reminder terms iteratively produced. Therefore, differentiating with respect to $x_1, \ldots, x_n, y_1, \ldots,y_n$, we are left with the expression in the first display.

Related Question