Consider any location-scale family determined by a "standard" distribution $F$,
$$\Omega_F = \left\{F_{(\mu, \sigma)}: x \to F\left(\frac{x-\mu}{\sigma}\right) \mid \sigma \gt 0\right\}.$$
Assuming $F$ differentiable we readily find that the PDFs are $\frac{1}{\sigma}f\left((x-\mu)/\sigma\right)dx$.
Truncating these distributions to restrict their support between $a$ and $b$, $a \lt b$, means that the PDFs are replaced by
$$f_{(\mu, \sigma; a,b)}(x) = \frac{f\left(\frac{x-\mu}{\sigma}\right)dx}{\sigma C(\mu, \sigma, a, b)}, a \le x \le b$$
(and are zero for all other values of $x$) where $C(\mu, \sigma, a, b) = F_{(\mu,\sigma)}(b) - F_{(\mu,\sigma)}(a)$ is the normalizing factor needed to ensure that $f_{(\mu, \sigma; a, b)}$ integrates to unity. (Note that $C$ is identically $1$ in the absence of truncation.) The log likelihood for iid data $x_i$ therefore is
$$\Lambda(\mu, \sigma) = \sum_i \left[\log{f\left(\frac{x_i-\mu}{\sigma}\right)} - \log{\sigma}-\log{C(\mu, \sigma, a, b)}\right].$$
Critical points (including any global minima) are found where either $\sigma=0$ (a special case I will ignore here) or the gradient vanishes. Using subscripts to denote derivatives, we may formally compute the gradient and write the likelihood equations as
$$\eqalign{
0 &= \frac{\partial\Lambda}{\partial\mu} &= \sum_i \left[\frac{-f_\mu\left(\frac{x_i-\mu}{\sigma}\right)}{f\left(\frac{x_i-\mu}{\sigma}\right)} -\frac{C_\mu(\mu,\sigma,a,b)}{C(\mu,\sigma,a,b)}\right] \\
0 &= \frac{\partial\Lambda}{\partial\sigma} &= \sum_i \left[\frac{-f_\sigma\left(\frac{x_i-\mu}{\sigma}\right)}{\sigma^2f\left(\frac{x_i-\mu}{\sigma}\right)} -\frac{1}{\sigma}-\frac{C_\sigma(\mu,\sigma,a,b)}{C(\mu,\sigma,a,b)}\right]
}$$
Because $a$ and $b$ are fixed, drop them from the notation and write $nC_\mu(\mu, \sigma, a, b)/C(\mu, \sigma,a,b)$ as $A(\mu,\sigma)$ and $nC_\sigma(\mu, \sigma, a, b)/C(\mu, \sigma,a,b)$ as $B(\mu, \sigma)$. (With no truncation, both functions would be identically zero.) Separating the terms involving the data from the rest gives
$$\eqalign{
-A(\mu,\sigma) &= \sum_i \frac{f_\mu\left(\frac{x_i-\mu}{\sigma}\right)}{f\left(\frac{x_i-\mu}{\sigma}\right)} \\
-\sigma^2 B(\mu,\sigma) - n\sigma &= \sum_i \frac{f_\sigma\left(\frac{x_i-\mu}{\sigma}\right)}{f\left(\frac{x_i-\mu}{\sigma}\right)}
}$$
By comparing these to the no-truncation situation it is evident that
Any sufficient statistics for the original problem are sufficient for the truncated problem (because the right hand sides have not changed).
Our ability to find closed-form solutions relies on the tractability of $A$ and $B$. If these do not involve $\mu$ and $\sigma$ in simple ways, we cannot hope to obtain closed-form solutions in general.
For the case of a normal family, $C(\mu,\sigma,a,b)$ of course is given by the cumulative normal PDF, which is a difference of error functions: there is no chance that a closed-form solution can be obtained in general. However, there are only two sufficient statistics (the sample mean and variance will do) and the CDF is as smooth as can be, so numerical solutions will be relatively easy to obtain.
$\mu$ parameter from truncated normal distribution describes it's mean before the truncation. The mean of the truncated normal distribution is
$$ \operatorname{E}(X \mid a<X<b) = \mu + \sigma\frac{\phi(\frac{a-\mu}{\sigma})-\phi(\frac{b-\mu}{\sigma})}{\Phi(\frac{b-\mu}{\sigma})-\Phi(\frac{a-\mu}{\sigma})} $$
where $a$ and $b$ are lower and upper truncation points. This mean is always between $a$ and $b$.
You are right that sampling from truncated normal distribution with parameters $\mu=-8$, $\sigma=0.2$, $a=0$, $b=\infty$ using non-specialized algorithm would be very inefficient (possibly you'd have problems with numbers falling below numerical precision), however since non-truncated normal distribution ranges from $-\infty$ to $\infty$, you can pick any truncation points such that $a < b$.
Hopefully we have specialized algorithms that deal with such cases as the one described by Christian Robert (mentioned in a comment below).
Best Answer
To truncate a distribution is to restrict its values to an interval and re-normalize the density so that the integral over that range is 1.
So, to truncate the $N(\mu, \sigma^{2})$ distribution to an interval $(a,b)$ would be to generate a random variable that has density
$$ p_{a,b}(x) = \frac{ \phi_{\mu, \sigma^{2}}(x) }{ \int_{a}^{b} \phi_{\mu, \sigma^{2}}(y) dy } \cdot \mathcal{I} \{ x \in (a,b) \} $$
where $\phi_{\mu, \sigma^{2}}(x)$ is the $N(\mu, \sigma^2)$ density. You could sample from this density in a number of ways. One way (the simplest way I can think of) to do this would be to generate $N(\mu, \sigma^2)$ values and throw out the ones that fall outside of the $(a,b)$ interval, as you mentioned. So, yes, those two bullets you listed would accomplish the same goal. Also, you are right that the empirical density (or histogram) of variables from this distribution would not extend to $\pm \infty$. It would be restricted to $(a,b)$, of course.