Solved – Why is Expected value of a random variable equal to the mean

distributionsexpected valueprobability

This is a derivation I found online for the mean of a random variable:

Let $X_1, X_2, \dots, X_n$ be $n$ independently drawn observations from a
distribution with mean $\mu.$

Let $\bar X$ be the mean of these $n$ independent observations:
$\bar X = \frac{X_1 + X_2 + \cdots + X_n}{n}.$

$$E(\bar X) = E\left( \frac{X_1 + X_2 + \dots + X_n}{n} \right) \\
= \frac{1}{n}E(X_1 + X_2 + \dots + X_n)\\
= \frac{1}{n} \left( E(X_1) + E(X_2) + \dots + E(X_n)\right)\\
= \frac{1}{n} ( \mu + \mu + \dots + \mu)
= \mu$$

Why is $E(X_i) = \mu,$ where $i = 1, 2, \dots, n?$

In other words, if the expected value is the same thing as the average, what do we mean when we talk about the average of a variable (i.e., $E(X_1)=\mu)?$

I'm' not sure if the question is structured well, but even after trying to understand the concept of expected value, I'm not quite able to get it.

Best Answer

In my comment, I said that an observation $X_i$ inherits all of the probability properties of the population from which it was sampled. Of course, no single observation can exhibit all of these properties by itself, but if we take a large sample from a population, we can infer much of the probability information that's in the population.

In particular, if we take the sample mean (average) $\bar X$ of all of the elements of a large sample, then $\bar X$ will be near $\mu.$ Because $Var(\bar X) = \sigma^2/n,$ we know that the variability of $\bar X$ will be small, giving an idea how near the sample mean $\bar X$ will actually be from the population mean $\mu.$

If we look at the population of points randomly placed in the interval $(0,1),$ then the population has the distribution $\mathsf{Unif}(0,1)$ with population mean $\mu=1/2$ and population variance $\sigma^2 = 1/12.$ Also about 25% of the points will lie between $3/4$ and $1.$

As an experiment, I will use R to take a sample of $n = 10,000$ values from this distribution. Then let's see what the mean of that large sample is, and what proportion of the points in the sample actually do lie between $3/4$ and $1.$

x = runif(10000)
mean(x)
[1] 0.5008642    # sample mean is very close to population mean 1/2
mean(x > 3/4 & x < 1)
[1] 0.248        # very nearly 25% of observations btw 3/4 and 1
var(x);  1/12
[1] 0.08267011   # sample variance; nearly the population variance 1/12
[1] 0.08333333   # ... exactly 1/12

We see that $\bar X = 0.500086,$ very near 1/2. Also that 24.8% of the sampled values lie in $(3/4, 1).$ (Showing how the variance of $\bar X$ works would require a messier simulation, which I will skip for now.)

A histogram of the 10,000 values is shown below, the position of $\bar X$ is indicated by the vertical black line near $1/2,$ and the vertical red lines have a about a quarter of the observations between them.

enter image description here