Chebyshev in relation to average and expectation

expected valueprobability distributionsprobability theory

I have 2 questions w.r.t average and expectation and how it relates to Chebyshevs inequality. My text book states the following.

In order to illustrate the relative advantage of Chebyshev's inequality compared to Markov's consider the following example. Let $X_{1}, \ldots, X_{n}$ be $n$ independent identically distributed Bernoulli random variables and let $\hat{\mu}_{n}=\frac{1}{n} \sum_{i=1}^{n} X_{i}$ be their average. We would like to bound the probability that $\hat{\mu}_{n}$ deviates from $\mathbb{E}\left[\hat{\mu}_{n}\right]$ by more $\operatorname{than} \varepsilon$ (this is the central question in machine learning). We have $\mathbb{E}\left[\hat{\mu}_{n}\right]=\mathbb{E}\left[X_{1}\right]=\mu$ and by independence of $X_{i}$-s and Theorem B.26 we have $\operatorname{Var}\left[\hat{\mu}_{n}\right]=\frac{1}{n^{2}} \operatorname{Var}\left[n \hat{\mu}_{n}\right]=\frac{1}{n^{2}} \sum_{i=1}^{n} \operatorname{Var}\left[X_{i}\right]=$ $\frac{1}{n} \operatorname{Var}\left[X_{1}\right]$. By Markov's inequality
$$
\mathbb{P}\left(\hat{\mu}_{n}-\mathbb{E}\left[\hat{\mu}_{n}\right] \geq \varepsilon\right)=\mathbb{P}\left(\hat{\mu}_{n} \geq \mathbb{E}\left[\hat{\mu}_{n}\right]+\varepsilon\right) \leq \frac{\mathbb{E}\left[\hat{\mu}_{n}\right]}{\mathbb{E}\left[\hat{\mu}_{n}\right]+\varepsilon}=\frac{\mathbb{E}\left[X_{1}\right]}{\mathbb{E}\left[X_{1}\right]+\varepsilon}
$$

Note that as $n$ grows the inequality stays the same. By Chebyshev's inequality we have
$$
\mathbb{P}\left(\hat{\mu}_{n}-\mathbb{E}\left[\hat{\mu}_{n}\right] \geq \varepsilon\right) \leq \mathbb{P}\left(\left|\hat{\mu}_{n}-\mathbb{E}\left[\hat{\mu}_{n}\right]\right| \geq \varepsilon\right) \leq \frac{\operatorname{Var}\left[\hat{\mu}_{n}\right]}{\varepsilon^{2}}=\frac{\operatorname{Var}\left[X_{1}\right]}{n \varepsilon^{2}}
$$

Theorem B.26. If $X_{1}, \ldots, X_{n}$ are independent random variables then
$$
\operatorname{Var}\left[\sum_{i=1}^{n} X_{i}\right]=\sum_{i=1}^{n} \operatorname{Var}\left[X_{i}\right]
$$

What I dont understand is why it is the case that:

$\mathbb{E}\left[\hat{\mu}_{n}\right]=\mathbb{E}\left[X_{1}\right]=\mu$

As I understand expectation it is the same as average. In this whole comparison we are interested in the deviation between the average, and an expectation given $n$ r.v's as I understand it. I am however unsure of why it then is the case that $X_1 = \mu$, and what the expression $E[X_1] = \mu$ actually means. I suspect that it doesn't mean that you are looking at the average/expectation of just $E[X_1] = \frac{1}{1}*\sum_{n} Pr[X = x_1]$, where $x_1$ is the value corresponding to the variable $X_1$ and $n = 1$ denotes the number of possible values that $X_1$ can take which is to say just a single value.

Furthermore I am not sure why $\mathbb{E}\left[\hat{\mu}_{n}\right]=\mathbb{E}\left[X_{1}\right]$ holds.

My second question relates to $\operatorname{Var}\left[\hat{\mu}_{n}\right]=\frac{1}{n^{2}} \operatorname{Var}\left[n \hat{\mu}_{n}\right]=\frac{1}{n^{2}} \sum_{i=1}^{n} \operatorname{Var}\left[X_{i}\right]=$ $\frac{1}{n} \operatorname{Var}\left[X_{1}\right]$
As I understand it the left hand side of the expression should rather look like:

$\operatorname{Var}\left[\hat{\mu}_{n}\right]=\frac{1}{n} \operatorname{Var}\left[n \hat{\mu}_{n}\right]$

I am not sure is the $n^2$ is related to some rule about expectation in relation to average that I miss and that might also explain my first question.

EDIT
I think Galton answered my first question and I do want to accept his answer. However I still have a hard time understanding his comment on the second question which I still struggle with.

Best Answer

I think the key point I was missing was that your $X_{i}$ 's are identically distributed. That means moments like expectation and variance to not depend on $i$. We have $E X_{1}=E X_{2}=\cdots=E X_{n}$, so we can replace every instance of $E X_{i}$ with $E X_{1}$. The same is true for variances. The $\hat{\mu}_{n}$ is the sample average of $X_{1}, \ldots, X_{n}$, and $E X$ is the expectation. For discrete $X$, this is $\sum_{j} P\left(X=x_{j}\right) x_{j}$ but for more general $X$ this is generally an integral.