We do not assume this, because it's not true. What is true is that as the sample gets larger, the sample mean gets closer to the population mean. A simple way to justify this is as follows. If we have
$$Y = \frac{1}{n} \sum_{i=1}^n Y_i$$
where the $Y_i$ are iid samples from some distribution, then we can compute not only that the expected value $\mathbb{E}(Y) = \mathbb{E}(Y_i) = \mu$ matches up, but also that
$$\text{Var}(Y) = \frac{1}{n} \text{Var}(Y_i) = \frac{\sigma^2}{n}$$
so that the variance of $Y$ decreases as $n$ gets large; here $\sigma$ is the population standard deviation. Chebyshev's inequality now implies that
$$\mathbb{P}(|Y - \mu| \ge r \sigma) \le \frac{1}{r^2 n}.$$
Hence, for example, setting $r = 1$ gives that the probability that $Y$ is outside $[\mu - \sigma, \mu + \sigma]$ (is not within one standard deviation of the population mean) is at most $\frac{1}{n}$. So, for example, if we wanted to get within one standard deviation with probability at least 95%, it would suffice to use a sample of $n = 20$. Note that this bound does not depend on the size of the population, which we're effectively assuming is infinite.
Actually we expect much better decay than this, which can be made precise using e.g. Chernoff bounds, although the central limit theorem is maybe a more intuitive way to think about these things.
Expected value of an estimator should be equal to the "theoretical" variance (in the case of unbiased estimator). Particular numerical result may differ. In fact, in many applications, "theoretical" variance is not known at all.
Best Answer
It is not true that sample mean is the 'best' choice of estimator of the population mean for any underlying parent distribution. The only thing true regardless of the population distribution is that the sample mean is an unbiased estimator of the population mean, i.e. $E(\overline X)=\mu$.
Now unbiasedness is often not the only criteria considered for choosing an estimator of your unknown quantity of interest. We usually prefer estimators that have smaller variance or smaller mean squared error (MSE) in general, because it is a desirable property to have in an estimator. And it might be the case that $\overline X$ does not attain the minimum variance/MSE among all possible estimators.
Consider a sample $(X_1,X_2,\ldots,X_n)$ drawn from a uniform distribution on $(0,\theta)$. Now $T_1=\overline X$ is an unbiased estimator of the population mean $\theta/2$, but it does not attain the minimum variance among all unbiased estimators of $\theta/2$. It can be shown that the uniformly minimum variance unbiased estimator (UMVUE) of the population mean is instead $T_2=\frac{n+1}{2n}\max(X_1,\ldots,X_n)$. So $T_2$ is the best estimator within the unbiased class where 'best' means 'having the smallest variance'.