[Math] Let $X$ be a random variable with mean $0$ and finite variance $\sigma^2$. By applying Markov’s inequality show that

probabilityprobability distributions

I am looking for confirmation that I am working in the correct direction as well as pointers for points where I have gone astray. Here is the problem.

(a) Let $X$ be a random variable with mean $0$ and finite variance $\sigma^2$. By applying Markov’s inequality to the random variable $W = (X + t)^2, t > 0$, show that

$$P (X \geq a) \leq \dfrac{\sigma^2}{\sigma^2 + a^2}\;\;\;\;\;\; \text{for any $a$} > 0.$$

(b) Hence show that, for any $a > 0$,
$$P(Y \geq \mu+a)\leq\sigma^2+a2$$
where $E(Y) = \mu, \;var(Y) = \sigma^2$.

(c) A set of $200$ people consisting of $100$ men and $100$ women is randomly divided into $100$ pairs of $2$ each. Number the men arbitrarily from $1$ to $100$, and for $i = 1,2,…,100$, let
$$z_i=\begin{cases}
1;\qquad\text{if man i is paired with a woman}\\
0;\qquad else
\end{cases}$$

Let Z be the number of these pairs that consist of a man and a woman,
$$z=\sum_{i=1}^{\infty}z_i$$
(i) Find E(Z).

(ii) Find var(Z).

(iii) Using (b), give an upper bound to the probability that Z ≤ 30.

Attempt:

Part (a) seems fairly straight forward, thus,
\begin{align}
P(X\geq a)&=P(X+t-t\geq a)\\
&=P(X+t\geq t+ a)\\
&\leq P((X+t)^2\geq (a+t^2))\\
&\leq \dfrac{E((X+t)^2)}{E((a+t)^2)}\qquad\text{by Markov inequality }\\
&=\dfrac{var(t)}{var(a+t)}\qquad\text{by definition on expectation}\\
&=\dfrac{\sigma^2}{\sigma^2+a^2}
\end{align}

(b)

For this one i feel as though i have made a fallacy; nonetheless, See that
\begin{align}
P(Y\leq \mu+a)&\leq P(Y^2\leq \mu^2+a^2)\\
&=\dfrac{E((Y)^2)}{E((\mu^2+a^2))}\\
&=\dfrac{var(Y)}{var(\mu)+var(a^2)}\\
&=\dfrac{\sigma^2}{\sigma^2+a^2}
\end{align}

(c) (i)

For this one i feel it should just be as follows,
\begin{align}
E(z)&=E(\sum_{i=1}^{100}z_i)\\
&=\sum_{i=1}^{100}E(z_i)\\
&=\dfrac{(100)(101)}{2}\\
&=5050
\end{align}
However, I suspect that it cannot be that simple as when I apply the same logic to find $var(z)$ I end up with a negative number. But how can this be? I must be making a mistake somewhere.

(iii)

Amended

On further inspection I see we should use the results found in the previous parts. thus, we want to use
\begin{align}
P(z\leq\mu +a)\leq\dfrac{\sigma^2}{sigma^2+a^2}
\end{align}
where $\sigma^2$ comes from the variance found previously and $a$ is given to be 30.

Best Answer

You have made some mistakes for (a)-(b): (so I did not go further, haven't fully checked your (c))

  • Writing $\operatorname{Var} t$, $\operatorname{Var} (a+t)$ is a good sign something went wrong: $a,t$ are constants here (the r.v. is $X$ or $Y$), so both variances are... $0$.

  • Another one is that $(\mu+a)^2 \neq \mu^2+a^2$ in general: $$ \mathbb{P}\{Y \geq \mu+a\} \leq \mathbb{P}\{Y^2 \geq (\mu+a)^2\}\neq \mathbb{P}\{Y^2 \geq \mu^2+a^2\} $$

  • Another one is that $\mathbb{E}[Y^2] = \operatorname{Var} Y + \mathbb{E}[Y]^2$, not $\mathbb{E}[Y^2] = \operatorname{Var} Y$.

What you can do is to apply (for any parameter $t\geq 0$) Markov's inequality to the (non-negative) random variable $X_t\stackrel{\rm def}{=}(Y-\mu +t)^2$ (here, I am dealing directly with the case (b) of general mean $\mu$, for (a) you can basically take $\mu=0$), so that $$ \mathbb{P}\{Y \geq \mu+a\} = \mathbb{P}\{Y - \mu + t \geq a + t\} \leq \mathbb{P}\{X_t \geq (a+t)^2\} \leq \frac{\mathbb{E}[X_t]}{(a+t)^2} = \frac{\operatorname{Var} Y + t^2}{(a+t)^2} $$ (the last equality requires expanding the square to verify that $\mathbb{E}[X_t] = \operatorname{Var} Y + t^2$.) Now, this holds for any $t\geq 0$, so you can differentiate and maximize the RHS for the "best possible $t$, which will give you what you want.

Remark: this is actually called "one-sided Chebyshev," or more accurately Cantelli's inequality.


Followup on (c):

By linearity of expectation (why do you have an $\infty$ bound on your sum, by the way?), $$ \mathbb{E}[Z] = \sum_{i=1}^{100} \mathbb{E}[Z_i] = \sum_{i=1}^{100} \mathbb{P}\{ Z_i =1 \} $$ Now, fix any $i$. The probability that man $i$ is paired with a woman is exactly $\frac{100}{200-1}=\frac{100}{199}$ (there are 199 other people, 100 of which being women). Therefore, $$ \mathbb{E}[Z] = 100\cdot \frac{100}{199} = \frac{10000}{199}. $$

How did you get $\frac{100\cdot 101}{2}$?

  • For the variance: We do not have linearity anymore, but we can use the fact that $$ \operatorname{Var} Z = \mathbb{E}[Z^2] - \mathbb{E}[Z]^2. $$ The second term, we have it, so let us compute the first: $$\begin{align} \mathbb{E}[Z^2] &= \mathbb{E}\left[ \left(\sum_{i=1}^{100} Z_i \right)^2 \right] = \mathbb{E}\left[ \sum_{i=1}^{100} Z_i^2 + 2\sum_{i<j} Z_iZ_j \right] = \sum_{i=1}^{100} \mathbb{E}\left[ Z_i^2 \right] + 2\sum_{1\leq i < j\leq 100}\mathbb{E}\left[ Z_iZ_j \right] \\ &= \sum_{i=1}^{100} \mathbb{E}\left[ Z_i \right] + 2\sum_{1\leq i < j\leq 100}\mathbb{E}\left[ Z_iZ_j \right] = \mathbb{E}\left[ Z \right] + 2\sum_{1\leq i < j\leq 100}\mathbb{E}\left[ Z_iZ_j \right] \end{align}$$ where in the end we used the fact that $Z_i^2=Z_i$ as $Z_i\in\{0,1\}$. Furthermore, for $i\neq j$ $$ \mathbb{E}\left[ Z_iZ_j \right] = \mathbb{P}\{ Z_i=Z_j=1 \} = \mathbb{P}\{ Z_i=1 \mid Z_j=1 \}\mathbb{P}\{ Z_j=1 \} = \mathbb{P}\{ Z_i=1 \mid Z_j=1 \}\frac{100}{199} = \frac{99}{197}\cdot\frac{100}{199} $$ since if $j$ is paired with a women, there remain $99$ women among the $197$ people that are neither $i,j$ or the women paired with $j$. Therefore, $$\begin{align} \mathbb{E}[Z^2] &= 100\cdot \frac{100}{199} + \binom{100}{2}\frac{99}{197}\cdot\frac{100}{199} \end{align}$$ and along with $\mathbb{E}[Z]^2$ this will give you the variance.