You have misinterpreted the article. The passage you are looking at never says anything about the actual population variance.
The passage literally says:
Now a question arises: is the estimate of the population variance that arises in this way using the sample mean always smaller than what we would get if we used the population mean?
The pronoun "what" refers to an estimate of the population variance. To spell it out more explicitly, the article compares two ways of estimating the population variance:
Subtract the sample mean from each observed value in the sample. Take the square of each difference. Add the squares. Divide by the number of observations.
Subtract the population mean from each observed value in the sample. Take the square of each difference. Add the squares. Divide by the number of observations.
The article then says that Method 1 always gives a smaller result except in the case where the sample mean happens to be exactly the same as the population mean, in which case both methods give the same result.
This is a simple consequence of the not-so-simple fact that if you take any finite list of numbers $(x_1, x_2, \ldots, x_n)$ and consider the function $f(m)$ defined by
$$ f(m) = (x_1 - m)^2 + (x_2 - m)^2 + \cdots + (x_n - m)^2, $$
the smallest value of $f(m)$ occurs when $m$ is the mean of that list of numbers,
that is, when $m$ is the sample mean.
Notice that none of the preceding statements compared anything with the actual population variance. The actual population variance could be unknown.
All the above statements are concerned only with estimates of the variance.
All of this does not mean that every sample will underestimate the population variance. We might draw a sample in which the data values are unusually far from the population mean. But in that case the variance we would compute using Method 2 would overestimate the variance by even more than Method 1.
And while this can happen, it is not the usual thing to happen. More often the variance computed by Method 2 is nearly correct or smaller than the true population variance, and the variance computed by Method 1 is simply too small.
That's the thing about statistics like this.
You can use a bad method and yet it sometimes will give you a correct result just by luck.
Expected value of an estimator should be equal to the "theoretical" variance (in the case of unbiased estimator). Particular numerical result may differ. In fact, in many applications, "theoretical" variance is not known at all.
Best Answer
It is probably more understandable to refer to the sample variance as the unbiased estimator of the overall variance. Usually, the set of numbers from which we are estimating these values do not reflect the entire universe of possibilities; they are a sample from which we want to make some inferences. We want to use this sample to estimate the mean and variance not of the sample itself, but of the underlying distribution. Understanding this, and running through the algebra (which can be found here) you see that using the statistic known as the "sample variance" is the statistic, if calculated on a hundred gazillion separate samples from the underlying distribution, whose expected value is the variance of the underlying distribution. When an estimator's expected value is the actual value in which we are interested, we call it "unbiased". Using the statistic known as the"population" variance, if we were to apply it to a huge set of samples and take ITS expectation (mean) it would be slightly lower than the "true" variance.
If, however, you have the entire (finite) population, and not a sample from it, such as the distribution of a six-sided die, then the population variance is the statistic to use, as you are not estimating it from a sample, but calculating it from the complete probability space.