[Math] sum of order statistics

pr.probabilityst.statistics

Suppose I have N real random variables with identical PDF. At every instance of these r.vs, I pick $K$ largest out of $N$. Lets call their sum as $S_K$. Alternatively, based on some criteria, I pick in an average sense $K$ largest numbers (i.e some times we pick K, K-1,K+1 etc, randomly). Lets call their sum $S_i$. Obviously $S_K$ and $S_i$ are r.vs. Can we say anything about the inequality associated with their means. $E(S_i) > E(S_K)$ ?.

What does it depend on? Will the result change for non-identical r.vs.?

Best Answer

The inequality $E(S_K) \geq E(S_i)$ holds.

To avoid any doubt, let me be more specific. Let $Y_1, Y_2, ..., Y_N$ be a collection of random variables, and write $X_1 \geq X_2 \geq ... \geq X_N$ for their reordering in non-increasing order.

Suppose $K < N$ is fixed and let $S_K$ be the sum of the $K$ largest of the random variables, that is $S_K=X_1+...+X_K$.

Let $R$ be a random variable taking values in $0,1,...,N$ which is independent of the random variables $Y_i$. The independence from the $Y_i$ is important of course (this is how I interpret your "based on some criteria". If the $R$ is allowed to depend on the realisation of the $Y_i$ then all sorts of different behaviours are possible).

Now let $S_R$ be the sum of the $R$ largest of the random variables, that is $S_R=X_1+...+X_R$. (In your notation this is $S_i$).

Suppose that $ER=K$. Then I claim that $E S_R \leq E S_K$, with equality iff $R=K$ with probability 1. (Unless the $Y_i$ are somehow degenerate, in which case equality can occur in other cases as well).

Proof: Write $p_k=P(R\geq k)$ for $k=1,2,...,N$. We have $\sum p_k=ER=K$.

Also

$S_R=\sum_{k=1}^N X_k I(R\geq k)$

so

$ES_R=\sum_{k=1}^N P(R\geq k) E X_k = \sum_{k=1}^N p_k E X_k$.

(Here we used the independence of $R$ from the $X_i$).

Consider maximising this sum subject to the constraints that $\sum p_k=K$ and that $1\geq p_1 \geq p_2 \geq p_3\geq ...$.

Since the terms $E X_k$ are decreasing in $k$, the maximum is achieved when $p_k=1$ for $k\leq K$ and $p_k=0$ for $k>K$. (Provided the $Y_i$ are not degenerate, the terms $E X_k$ are strictly decreasing, and this is the only way to achieve the maximum. If not, the maximum may be achieved in some other cases too).

That is, the maximum value of $ES_R$ occurs precisely if $R$ is equal to $K$ with probability 1.

It doesn't matter whether the $Y_i$ are identically distributed, and also they don't need to be independent. However, it is important that $R$ is independent of the $Y_i$.

Related Question