It would depend on details of the distribution family. For normal distributions, what you said would be true. For some more heavy-tailed distribution, it might not. You could for instance check with some t-distribution with low degrees of freedom.
Suppose you want to know the average height of adults in a given country. Also suppose that if you were able to get all height measurements, the distribution of the data would follow a normal distribution. This distribution has then two important features, that is the mean $\mu$, which is the center of the distribution, and the standard deviation $\sigma$, which is a measure of spread around the center of that distribution. In this scenario, one standard deviation around the mean would capture in 68% of the data points; two standard deviations would capture 95% of the data points; and three standard deviations would capture 99.7% of the data points (see here).
However, in almost all cases, you are not able to measure all members of a given population, instead and you have to rely on taking random samples from that distribution to estimate how far the sample mean is from the true population mean $\mu$. If this is your goal, then you calculate the standard error of the mean. One standard error of the mean is then the interval in which the true population mean would fall 68% of the time if sampling was repeated over and over again. Usually in statistics a 95% confidence interval is used, which you can get by multiplying the standard error with 2 (see link above). Given the formula for the standard error of the mean, it is also apparent that if the sample size goes up, the interval tends to zero and you are closing in on your population mean $\mu$ (as in your quote above). Thus, the standard error of the mean is a tool in inferential statistics, that is inferring from the distribution of a random sample (observed data) to properties of an underlying unknown distribution, or the population.
The standard deviation on the other hand is used to describe the variability in the observed data only (i.e. the sample) without making any inferences with respect to properties of the underlying unknown distribution. The standard deviation is commonly used in descriptive statistics.
Now depending on whether you want to infer properties of an unknown distribution from a random sample (which is what we are mostly interested in when doing statistics), or whether you want to simply describe the variability in your sample, you should use the standard error of the mean and the standard deviation, respectively.
Best Answer
Imagine a population where the real mean is 100. You have a sample of 101, 103, 97, 99. You increase the sample size by 1 and pull our a value of 120. Has the sample mean gotten closer or further from the population mean?
At most you could say that "mostly" the sample mean gets closer to the population mean with larger sample size. This can be quantified, of course...