First, the standard deviation is not the average distance to the mean, that is always zero. It is however, a value to measure how far the points are from the mean or not. Assuming the values are normally distributed, we know that ~68% of the values are between $\mu-\sigma$ and $\mu+\sigma$, for example.
Suppose we weigh potatoes with average weight 100 g and stadard deviation 5 g. What does hold for the average of the average weight of a group of 4 potatoes?
I hope you see that the average of the average weight is still 100 g. But what is the standard deviation of this average weight? That is where you use the formula
$$\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{5}{\sqrt{4}} =2.5$$
Feel free to ask if you still don't understand.
Proof that the average distance between the actual data and the mean is
$0$:
$$\frac{\sum^n_{i=1} (x_i-\mu)}{n} = \frac{(\sum^n_{i=1} x_i)-\mu n}{n} = \frac{\sum^n_{i=1} x_i}{n}-\mu = \mu - \mu = 0$$
I think this is a good question, as it points to an interesting conceptual problem about what a parameter is. In general, if we talk about the probability density function (PDF) of a particular distribution, we usually refer to one established of infinitely many density functions, all of which would describe the distribution equally well. Let me explain this by some examples.
The normal distribution: $\sigma$ versus $\sigma^2$
First, notice that the normal distribution can be parameterized in different ways. At Wikipedia, for instance, you will find the parameterization
$$
f_1(x \ | \ \mu,\sigma) \;, \tag{1}
$$
where $\mu$ is called the mean and $\sigma$ the standard deviation. But many people prefer
$$
f_2(x \ | \ \mu,\sigma^2) \;, \tag{2}
$$
where $\sigma^2$ is called the variance. Since $f_1$ and $f_2$ have different signatures, they are two different functions, even though their definitions are identical:
$$
f_1(x | \mu, \sigma) \ = \ \dfrac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(x-\mu)^2}{2\sigma^2} \right) \ = \ f_2(x \ | \ \mu,\sigma^2) \;. \tag{3}
$$
The normal distribution: precision $h$
Second, if I remember correctly, Gauß used the parameter $h$, which he called precision, and defined the PDF of the normal distribution as
$$
f_3(x \ | \ \mu,h) = \dfrac{h}{\sqrt{\pi}} \exp\left( -h^2(x - \mu)^2 \right) \;. \tag{4}
$$
It is yet another parameterization with a different measure of dispersion, and even though (2) looks different from (1), both equations describe the same distribution (to see this, set $h = \dfrac{1}{\sqrt{2\sigma^2}}$).
The normal distribution: mode and median
Third, notice that $\mu$ is not only the mean of the normal distribution, but it is also the mode and the median. So, if you ask for distributions that can be parameterized with the mode or the median, the normal distribution is again an example. It is just a matter of how you interpret $\mu$.
The log-normal distribution: geometric mean $m$
Let me conclude with the log-normal distribution. It is often defined by relating it to the normal distribution:
Iff $\ln(X)$ follows a normal distribution with mean $\mu$ and standard deviation $\sigma$, then $X$ follows a log-normal distribution with the same parameters. In symbols:
$$
\ln(X) \sim \mathcal{N}\left( \mu, \sigma \right) \,\Leftrightarrow\, X \sim \mathcal{L}\left( \mu, \sigma \right)
$$
This is why the following PDF became the established function:
$$
g_1(x \ | \ \mu,\sigma) = \dfrac{1}{\sqrt{2\pi\sigma^2x^2}} \exp\left( -\dfrac{(\ln(x) - \mu)^2}{2\sigma^2} \right) \;. \tag{5}
$$
Notice, that $\mu$ and $\sigma$ do not correspond to the mean and the standard deviation of the log-normal distribution. Another, and as I think more natural, definition is
$$
g_2(x \ | \ m,\sigma^2) = \dfrac{1}{\sqrt{2\pi\sigma^2 x^2}} \exp\left( -\dfrac{\ln^2(\frac{x}{m})}{2\sigma^2} \right) \;, \tag{6}
$$
where $m$ corresponds to the geometric mean.
Best Answer
If the weight $X$ of a randomly poured bag of rice is normally distributed with mean $\mu$ and standard deviation $\sigma = 18$, then the probability that $X < 1000$ is $$\Pr[X < 1000] = \Pr\left[\frac{X - \mu}{\sigma} < \frac{1000 - \mu}{18}\right].$$ If the machine is set so that $\mu = 1015$, then this probability becomes $$\Pr\left[\frac{X-\mu}{\sigma} < \frac{1000-1015}{18}\right] = \Pr[Z < -15/18],$$ where $Z$ is a standard normal random variable with mean $0$ and standard deviation $1$; thus we can look up this probability in a normal distribution table, or use a calculator to obtain $$\Pr[Z < -15/18] = \Phi(-15/18) \approx 0.202328.$$ Thus, when the machine is set to measure out on average 15 grams more rice than is needed, about $20\%$ of the time the bag will still be underweight. This makes sense: the variability of the weight of a bag is $18$ grams, so $15$ extra grams of margin is still less than one standard deviation above the mean. Now if we wanted to be extra sure, we could set the machine to pour out on average, say, 3 standard deviations above the mean, or an extra $3(18) = 54$ grams of rice. The probability of a bag being underweight in such a case would now be only about $0.00135$, much less likely. But the cost of incorporating such an allowance should also be considered, since this would mean that on average, you're giving away an extra $1/18^{\rm th}$ of a bag.