After reviewing the literature that I saw I don't think there's any legitimate sense in which the standard the deviation is (b-a)/6. The fixed denominator is because the PERT estimate is calculated as a weighted sum of 3 points, repeating one of the points 4 times.
$$
d = \frac{a + 4b + c}{6}.
$$
So in a sense, it seems like they're acting like they have 6 points. But the numerator on that SD is quite magical to me.
edit
Ah, so my denominator guess was completely wrong (I was wondering why it wasn't $\sqrt{6}$ anyway). So they are "backing into" the notion of standard deviation by saying the distance from optimistic to pessimistic should be $6\sigma$. Then as seen in the snippet above they're treating each path as a sum of random variables and getting the standard deviation along the path by taking $\sqrt{\sigma_1^2 + \dots + \sigma_k^2}$. It's more reasonable than I initially gave it credit.
The units of the average of a quantity is always the units of the quantity itself. Your mistake is in the way you're interpreting the average.
Suppose, we want to take the time-average value of a quantity (say "mass"), which is a function of time, $M(t)$ between times $t_1$ and $t_2.$ The time-average value is
$$
\langle M\rangle = \frac{\int_{t_1}^{t_2} M(t) dt}{t2 - t1}.
$$
The numerator has units of "mass*time" and the denominator has units of "time". Thus the overall units of the average is "mass".
What you're doing is making the approximation that the mass is constant over a period of two days. If $t2 - t1$ is a period of two days, with a constant value $m$, then the integral becomes $m (t_2 - t_1) / (t_2 - t_1) = m.$ You're still doing the integral above, but it's a piecewise integral of constant values in each of their time windows. That's the approximation you're making.
If we have times $t_0, t_1, t_2, \dots, t_N$ in equally spaced increments with $t_i - t_{i-1} = \delta t$, and approximate $M(t) \approx m_i$ for $t_{i-1} < t \le t_i$ then
$$
\begin{split}
\langle M\rangle &= \frac{1}{t_N - t_0}\int_{t_0}^{t_N} M(t) dt \\
&\approx \frac{1}{N \delta t} \sum_{i=1}^N m_i \delta t \\
&= \frac{1}{N} \sum_i m_i,
\end{split}
$$
which is just the sample mean. Likewise,
$$
\begin{split}
Var(M) &= \frac{1}{t_N - t_0} \int_{t_0}^{t_N} (M(t) - \langle M \rangle )^2 dt \\
&\approx \frac{1}{N \delta t} \sum_{i=1}^N (m_i - \langle M \rangle)^2 \delta t \\
&= \frac{1}{N} \sum_{i=1}^N (m_i - \langle M \rangle )^2
\end{split}
$$
which is just the sample variance.
Best Answer
The factor of $\sqrt{\frac{2}{\pi}}$ is based on assuming a normal distribution.
If that was a good value to use, this would mean that if you wanted to compute sd from md in large samples you'd multiply by $\sqrt{\frac{\pi}{2}}$
If the data are not close to normal, using that scale factor may not yield a suitable estimate of sample standard deviation.
Considered in terms of sample features, the two respond differently to large and small deviations, so in some samples the ratio of mean deviation (md) to sd may be very close to 1, while in other samples it may be far from 1. [I use md for mean deviation because MAD is often used to stand for median absolute deviation from the median.]
i) consider a sample of 1000 0's and 1000 1's. md/sd $\approx$ 1
ii) consider a sample of one "0", one "1" and 998 "$\frac{_1}{^2}$". md/sd $\approx$ 0.0447
If you were in case (i) and multiplied md by $\sqrt{\frac{\pi}{2}}$ you'd get a number that was 57% too big. If you were in case (ii) and multiplied md by $\sqrt{\frac{\pi}{2}}$ you'd get a number that was only about 7% as big as it should be.
Mean deviation won't exceed standard deviation, but in some cases it can be quite a lot smaller than it. In particular, if tails are heavier than normal, md/sd might be a good deal smaller than in the normal case.
If you also have other information than the mean deviation, you might be able to approximate it a little better.