Note that the sample mean $\bar{X}$ is also normally distributed, with mean $\mu$ and variance $\sigma^2/n$. This means that
$$\operatorname E(\bar{X}^2) = \operatorname E(\bar{X})^2 + \operatorname{Var}(\bar{X}) = \mu^2 + \frac{\sigma^2}n$$
If all you care about is an unbiased estimate, you can use the fact that the sample variance is unbiased for $\sigma^2$. This implies that the estimator
$$\widehat{\mu^2} = \bar{X}^2 - \frac{S^2}n$$
is unbiased for $\mu^2$.
In that paragraph the authors are giving an extreme example to show how being unbiased doesn't mean that a random variable is converging on anything.
The authors are taking a random sample $X_1,\dots, X_n \sim \mathcal N(\mu,\sigma^2)$ and want to estimate $\mu$. Noting that $E(X_1) = \mu$, we could produce an unbiased estimator of $\mu$ by just ignoring all of our data except the first point $X_1$. But that's clearly a terrible idea, so unbiasedness alone is not a good criterion for evaluating an estimator. Somehow, as we get more data, we want our estimator to vary less and less from $\mu$, and that's exactly what consistency says: for any distance $\varepsilon$, the probability that $\hat \theta_n$ is more than $\varepsilon$ away from $\theta$ heads to $0$ as $n \to \infty$. And this can happen even if for any finite $n$ $\hat \theta$ is biased. An example of this is the variance estimator $\hat \sigma^2_n = \frac 1n \sum_{i=1}^n(y_i - \bar y_n)^2$ in a normal sample. This is biased but consistent.
Intuitively, a statistic is unbiased if it exactly equals the target quantity when averaged over all possible samples. But we know that the average of a bunch of things doesn't have to be anywhere near the things being averaged; this is just a fancier version of how the average of $0$ and $1$ is $1/2$, although neither $0$ nor $1$ are particularly close to $1/2$ (depending on how you measure "close").
Here's another example (although this is almost just the same example in disguise). Let $X_1 \sim \text{Bern}(\theta)$ and let $X_2 = X_3 = \dots = X_1$. Our estimator of $\theta$ will be $\hat \theta(X) = \bar X_n$. Note that $E \bar X_n = p$ so we do indeed have an unbiased estimator. But $\bar X_n = X_1 \in \{0,1\}$ so this estimator definitely isn't converging on anything close to $\theta \in (0,1)$, and for every $n$ we actually still have $\bar X_n \sim \text{Bern}(\theta)$.
Best Answer
You can find everything here. However, here is a brief answer.
Let $\mu$ and $\sigma^2$ be the mean and the variance of interest; you wish to estimate $\sigma^2$ based on a sample of size $n$.
Now, let us say you use the following estimator:
$S^2 = \frac{1}{n} \sum_{i=1}^n (X_{i} - \bar{X})^2$,
where $\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i$ is the estimator of $\mu$.
It is not too difficult (see footnote) to see that $E[S^2] = \frac{n-1}{n}\sigma^2$.
Since $E[S^2] \neq \sigma^2$, the estimator $S^2$ is said to be biased.
But, observe that $E[\frac{n}{n-1} S^2] = \sigma^2$. Therefore $\tilde{S}^2 = \frac{n}{n-1} S^2$ is an unbiased estimator of $\sigma^2$.
Footnote
Start by writing $(X_i - \bar{X})^2 = ((X_i - \mu) + (\mu - \bar{X}))^2$ and then expand the product...
Edit to account for your comments
The expected value of $S^2$ does not give $\sigma^2$ (and hence $S^2$ is biased) but it turns out you can transform $S^2$ into $\tilde{S}^2$ so that the expectation does give $\sigma^2$.
In practice, one often prefers to work with $\tilde{S}^2$ instead of $S^2$. But, if $n$ is large enough, this is not a big issue since $\frac{n}{n-1} \approx 1$.
Remark Note that unbiasedness is a property of an estimator, not of an expectation as you wrote.