Linearity Of Expectations in Bernoulli’s trials

binomial distributiondiscrete mathematicsexpectationprobability

I am having hard time understanding Linearity of Expectations. Here is my understanding.

$X$ is a random variable defined on Sample Space $S$ as $Pr\{X=x\}= \sum_{s \in S:X(s)=x}Pr\{s\}$

Expectation of Random variable is weighted average defined as $E[X]= \sum_{x}x.Pr\{X=x\}$.

Also, the Linearity of Expectation says, if $X$ is a random variable which is sum of other random variables such as $X_1,X_2,X_3,…X_n$ i.e.,
$X=X_1+X_2+X_3+…..+X_n$, then $$E[X]=E[X_1]+E[X_2]+E[X_3+….+E[X_n]=\sum_{i=1}^nE[X_i]$$.

Let's image we are doing a $n$-Bernoulli trial in which success probability is $p$ and failure probability is $q$ and we defined a random variable $X=$ Number of Successes where $X$ could take values $\{0,1,2,3….,n\}$.This gives $$E[X]=0.Pr\{X=0\}+1.Pr\{X=1\}+2.Pr\{X=2\}+3.Pr\{X=3\}+….+n.Pr\{X=n\}$$ I could see that $E[X]=np$ by the definition of binomial distribution formed by Bernoulli trials and applying some manipulations on it.

In the book I am reading, it is said that, instead of doing complex algebra to find the expectation, we can express X as sum of smaller random variables and apply Linearity of Expectation.Let $X_i$ is the random variable which describes number of successes in $i^{th}$ trial. It is said that $X=X_1+X_2+X_3+….+X_n$ (I don't understand why this is true).

Also, in order to find the $E[X_i]$ we are using Indicator Random Variables in which $Pr\{X=x_i\}=1.p+0.q=p$, that means $X$ can take values $0,1$.

I don't understand why a random variable which describes number of successes in all the trials is sum of successes in each trial.

Also, when we are computing original Expectation which is $E[X]$ we are working on sample space which has $2^n$ (domain of $X$) sample points, cause each trial has two possible outcomes and we have $n$ trials.

To calculate Expectation of individual random variables why did we shrink our sample space to two? I know it is due to the fact that, we are working on per trail basis but my expectation is, all the individual random variables also should be defined on original number of trails, not per trail.

If I do the math, I could get the correct value but what I lack here is intuition

Best Answer

For simplicity let $n=2$ and throw a coin twice.

We speak of a success if head turns up and $X$ is by definition the number of successes, so takes values in $\{0,1,2\}$.

Further define $X_1$ by stating that $X_1$ takes value $1$ if the first throw gives heads, and takes value $0$ otherwise.

Likewise define $X_2$ by stating that $X_2$ takes value $1$ if the second throw gives heads, and takes value $0$ otherwise.

Can you grasp that $X=X_1+X_2$?

If not directly then look at all $4$ different outcomes we can get: $$\begin{array}{ccccc} & HH & HT & TH & TT\\ X & 2 & 1 & 1 & 0\\ X_{1} & 1 & 1 & 0 & 0\\ X_{2} & 1 & 0 & 1 & 0\\ X_{1}+X_{2} & 2 & 1 & 1 & 0 \end{array}$$


For finding e.g. the expectation of a Bernoulli random variable $B$ we only have to know the value of $P(B=1)$. The looks of the underlying sample space are irrelevant.


edit

To calculate Expectation of individual random variables why did we shrink our sample space to two?

We do not!

Let us build a model for the situation that you describe. As sample space we can use the set $\Omega:=\{0,1\}^n$. Every subset of $\Omega$ is an event and must be equipped in a suitable way with a probability. If the probability on success is $p$ then tuple $\omega=(\omega_1,\dots,\omega_n)\in\{0,1\}^n$ induces an event $\{\omega\}\subseteq\Omega$ and it is equipped with probability $p^{\sum_{i=1}^n\omega_i}(1-p)^{n-\sum_{i=1}^n\omega_i}$ to occur. So for a subset $B\subseteq\{0,1\}^n$ we have $$P(B)=\sum_{\omega\in B}p^{\sum_{i=1}^n\omega_i}(1-p)^{n-\sum_{i=1}^n\omega_i}$$

Any function $Y:\Omega\to\mathbb R$ is a random variable.

Examples are:

  • function $X$ prescribed by $\omega\mapsto\sum_{i=1}^n\omega_i$
  • for $i=1,\dots, n$ function $X_i$ prescribed by $\omega\mapsto\omega_i$

Now observe that for every $\omega\in\Omega$ we have:$$X(\omega)=X_1(\omega)+\cdots+X_n(\omega)$$ leading to:$$\mathbb EX=\int X(\omega)P(d\omega)=\int X_1(\omega)+\cdots+X_n(\omega)P(d\omega)=\sum_{i=1}^n\int X_i(\omega)P(d\omega)=\sum_{i=1}^n\mathbb EX_i$$

Function $X$ has binomial distribution with parameters $n$ and $p$.

The functions $X_i$ are iid and have Bernoulli distribution with parameter $p$.

Related Question