Probability – Expected Value of Order Statistics Conditional on a Condition

conditional-expectationorder-statisticsprobabilitystatistics

Consider $N$ random variables $X_1, X_2, \ldots, X_N$ that are i.i.d. distributed according to some cumulative distribution function $F$. Assume we receive a signal that says that $n$ number of the random variables will have values above some threshold $t$ (however we don't know which). To ease notation let $S_A$ denote this subset of random variables, and let $S_B$ denote the remaining $N-n$ variables. Let $g(S_A) = min(S_A)$ be the 1st order statistics of $S_A$.

1) What is the conditional expected value of $g(S_A)$?
$$\mathbb{E}[g(S_A)|t,n]$$

I know that the pdf and expected value corresponding to the 1st order statistics of the entire set, i.e. $X_{(1)} = min(X_1, X_2, \ldots, X_N)$, is respectively
$$f_{X_{(1)}}(x) = N(1-F(x))^{N-1}f(x)$$
$$\mathbb{E}[X_{(1)}] = N \int_{-\infty}^\infty x \left(1 – F(x)\right)^{N-1} f(x) dx$$
Setting $N=n$ in the equation above would not give $\mathbb{E}[g(S_A)|t,n]$, since I haven't taken account of the fact that the lowest $N-n$ random variables have values below $t$. I think I need something like
$$\mathbb{E}[g(S_A)|t,n] = \mathbb{E}[X_{(N-n+1)}| X_{(N-n)} < t]$$

Furthermore let $h(S_B) = h(|S_B|) = h(N-n)$ be a linear function of the size of $S_B$.

2) What is the conditional expected value of $g(S_A)h(S_B)?$
$$\mathbb{E}[g(S_A)h(S_B)|t,n]$$
For general functions $g$ and $h$, $\mathbb{E}[g(S_A)h(S_B)|t,n] \ne \mathbb{E}[g(S_A)|t,n] \times \mathbb{E}[h(S_B)|t,n]$, since $S_A$ and $S_B$ can be considered dependent random variables. But is it the case that $\mathbb{E}[g(S_A)h(S_B)|t,n] = \mathbb{E}[g(S_A)|t,n] \times \mathbb{E}[h(S_B)|t,n]$ when $h$ is a function of the size of $S_B$?

Best Answer

We are told that exactly $n$ rvs have a value greater than $t$. It's clear (perhaps not so much?) that the statistic of those $n$ variables are only affected by the truncation (but they are still independent). Then, the result for the 1st order statistic applies to the truncated distributions.

Let $G(x)$ be cumulative density of the $n$ truncated variables, with $x> t$. Then $$G(x) = \frac{F(x)-F(t)}{1-F(t)}$$

(Here, and at what follows, we are implicitly assuming conditioning on $n,t$).

Letting $A(x)$ be the CDF of the minimum, we get

$$A(x)= 1 - (1-G(x))^n=1 - \left(1-\frac{F(x)-F(t)}{1-F(t)}\right)^n=1 - \left(\frac{1-F(x)}{1-F(t)}\right)^n$$

From this you can readily compute the expectation and solve point 1).

$$\mathbb{E}[g(S_A)|t,n] = \int_t^\infty \left[x a(x)\right] dx = \int_t^\infty \left[x n \left(\frac{1-F(x)}{1-F(t)}\right)^{n-1} \frac{f(x)}{1-F(t)}\right] dx$$

The rest is rather trivial, because $h()$ conditioned on $(n,t)$ is deterministic, hence it goes outside the expectation.