I don't really see the answer in the other answer so let me do the calculation here. Your general Boltzmann Ansatz says that the probability of a state $n$ depends on its energy as
$$ p_n = C \exp(-\beta E_n) $$
where $\beta = 1/kT$. Fermions are identical particles that, for each "box" or one-particle state they can occupy (given e.g. by $nlms$ in the case of the Hydrogen atom-like states), admit either $N=0$ or $N=1$ particles in it. Higher numbers are forbidden by the Pauli exclusion principle. The energies of the multi-particle state with $N=1$ and $N=0$ in a particular one-particle state $nlms$ differ by $\epsilon$. Consequently,
$$ \frac{p_1}{p_0} = \frac{C\exp(-\beta (E+\epsilon))}{\exp(-\beta E)} = \exp(-\beta \epsilon) $$
where I used the Boltzmann distribution. However, the probabilities that the number of particles in the given one-particle state is equal to $N=0$ or $N=1$ must add to one,
$$ p_0 + p_1 = 1.$$
These conditions are obviously solved by
$$ p_0 = \frac{1}{1+\exp(-\beta\epsilon)}, \qquad p_1 = \frac{\exp(-\beta\epsilon)}{1+\exp(-\beta\epsilon)}, $$
which implies that the expectation value of $n$ is equal to the right formula for the Fermi-Dirac distribution:
$$\langle N \rangle = p_0\times 0 + p_1 \times 1 = p_1= \frac{1}{\exp(\beta\epsilon)+1} $$
The calculation for bosons is analogous except that the Pauli exclusion principle doesn't restrict $N$. So the number of particles (indistinguishable bosons) in the given one-particle state may be $N=0,1,2,\dots $. For each such number $N$, we have exactly one distinct state (because we can't distinguish the particles). The probability of each such state is called $p_n$ where $n=0,1,2,\dots$.
We still have
$$\frac{p_{n+1}}{p_n} = \exp(-\beta\epsilon) $$
and
$$ p_0 + p_1 + p_2 + \dots = 1 $$
These conditions are solved by
$$ p_n = \frac{\exp(-n\beta\epsilon)}{1+\exp(-\beta\epsilon)+\exp(-2\beta\epsilon)+\dots } $$
Note that the ratio of the adjacent $p_n$ is what it should be and the denominator was chosen so that all the $p_n$ from $n=0,1,2\dots$ sum up to one.
The expectation value of the number of particles is
$$ \langle N \rangle = p_0 \times 0 + p_1 \times 1 + p_2\times 2 + \dots $$
because the number of particles, an integer, must be weighted by the probability of each such possibility. The denominator is still inherited from the denominator of $p_n$ above; it is equal to a geometric series that sums up to
$$ \frac{1}{1-q} = \frac{1}{1-\exp(-\beta\epsilon)} $$
Don't forget that $1-\exp(-\beta\epsilon)$ is in the denominator of the denominator, so it is effectively in the numerator.
However, the numerator of $\langle N \rangle$ is tougher and contains the extra factor of $n$ in each term. Nevertheless, the sum is analytically calculable:
$$ \sum_{n=0}^\infty n \exp(-n \beta\epsilon) = - \frac{\partial}{\partial (\beta\epsilon)} \sum_{n=0}^\infty \exp(-n \beta\epsilon) =\dots$$
$$\dots = - \frac{\partial}{\partial (\beta\epsilon)} \frac{1}{1-\exp(-\beta\epsilon)} = \frac{\exp(-\beta\epsilon)}{(1-\exp(-\beta\epsilon))^2} $$
This result's denominator has a second power. One of the copies gets cancelled with the denominator before and the result is therefore
$$ \langle N \rangle = \frac{\exp(-\beta\epsilon)}{1-\exp(-\beta\epsilon)} = \frac{1}{\exp(\beta\epsilon)-1} $$
which is the Bose-Einstein distribution.
You could also obtain another version of the Boltzmann distribution for distinguishable particles by a similar calculation. For such particles, $N$ could take the same values as it did for bosons. However, the multiparticle state with $N$ particles in the one-particle state would be degenerate because the particles are distinguishable. There would actually be $N!$ multiparticle states with $N$ particles in them. The sum would yield a Taylor expansion for the same exponential.
Note added later: the derivation above was for $\mu=0$. When the chemical potential is nonzero, all appearances of $\epsilon$ have to be replaced by $(\epsilon-\mu)$. Of course that one may only talk about a well-defined value of $\mu$ when we deal with a grand canonical potential; it is impossible to derive a formula depending on $\mu$ from one that contains no $\mu$ and assumes it's ill-defined. The derivation above was meant to show that the difficult $1/(\exp\pm 1)$ structures do appear from a simpler $\exp(-\beta E)$ Ansatz because I think it's the only nontrivial thing to be shown while discussing the relations between the Boltzmann and BE/FD distributions. If that derivation proves the same link as the textbook does, then I apologize but I think there is "nothing else" of a similar kind to be proven.
It takes a lengthy proof, but Lyman Spitzer shows in the second chapter of Physical Processes in the Interstellar Medium (the standard text in interstellar matter studies) that the velocity distribution of interstellar gas particles (which is what forms nebulae) is very nearly Maxwellian - the deviation is less than 1%.
Other larger systems, probably not so much - Maxwell-Boltzmann statistics work best when kinetic energy is dominant in a system. But I don't know much about the topic, so that is a guess.
Best Answer
You are absolutely right that the limit in which this approximation holds is
$$\beta(\epsilon - \mu) \gg 1 \,,$$
which is not trivially the 'high-temperature limit', and indeed looks rather like the low temperature limit. However, it also looks like the limit of large negative $\mu$. If we want to know how temperature will affect the exponent, we need to know how temperature will effect the chemical potential. To proceed, suppose we're dealing with a gas of non-interacting particles. The grand potential is, in this limit,
$$ \Phi = -k_B T \ln \mathcal{Z} = -k_B T \int_0^\infty \ln \mathcal{Z}_\epsilon \,g(\epsilon)\,\mathrm{d}\epsilon \simeq -k_B T \int_0^\infty \ln \bigg(1 + \exp(-\beta(\epsilon - \mu))\bigg)\,g(\epsilon) \,\mathrm{d}\epsilon \,,$$
where $\mathcal{Z}_\epsilon$ is the grand partition function associated with the energy level $\epsilon$ and $g(\epsilon)$ is the density of states. The integral is essentially just a sum of the partition functions due to each energy level. To get to the final expression we have assumed that we can approximate the grand partition function like so:
$$ \mathcal{Z}_\epsilon = \sum_{n} \bigg(\exp(-\beta(\epsilon - \mu))\bigg)^n \simeq 1 + \exp(-\beta(\epsilon - \mu)) \,,$$
which corresponds to the limit stated at the top. As a brief detour, if we want to find the average occupancy of the energy level $\epsilon$, we can use
$$ \langle N_\epsilon \rangle = -\left(\frac{\partial \Phi_\epsilon}{\partial \mu}\right)_{T,V} \simeq \exp(-\beta(\epsilon - \mu))\qquad \mathrm{where} \qquad \Phi_\epsilon = -k_B T \ln \mathcal{Z}_\epsilon\,,$$
which is the Maxwell-Boltzmann distribution we were expecting (in the second equality we have Taylor expanded the logarithm in accordance with $\beta(\epsilon - \mu) \gg 1$). Now the density of states for a three-dimensional gas in a box can be obtained by standard means --- I won't bother going through it here, but the end result is:
$$ \Phi = -k_B TV\left(\frac{mk_B T}{2 \pi \hbar^2}\right)^{3/2} \exp(\beta \mu) \equiv -\frac{k_B T V}{\lambda^3} \exp(\beta \mu) \,,$$
where the thermal wavelength $\lambda$ has been defined appropriately. From here we can write
$$N_\mathrm{tot} \equiv N = -\left(\frac{\partial \Phi}{\partial \mu}\right)_{T,V} = \frac{V}{\lambda^3} \exp(\beta \mu) \,,$$
and hence
$$ \boxed{\mu = k_B T \ln \left(\frac{N \lambda^3}{V}\right) \,.}$$
Now to answer your question. The condition at the top can be considered the limit of $\beta \mu$ being large and negative. We see from the above that
$$ \beta \mu = \ln\left(\frac{N \lambda^3}{V}\right) \qquad \mathrm{where} \qquad \lambda = \left( \frac{2 \pi \hbar^2}{mk_B T}\right)^{1/2} \,.$$
This quantity will be large and negative when the argument of the logarithm is small. This will be the case for a) low densities $N/V$, b) high temperatures $T$ and/or c) high-mass particles.
You should think of the underlying situation in which the classical limit holds as when the number of thermally accessible states vastly exceeds the number of particles. This is because under such circumstances we can ignore multiple occupation of energy levels, which means we can ignore the fine details of particle indistinguishability. In the canonical distribution, when the number of states vastly exceeds the number of particles we can account for indistinguishability with a simple (but approximate) correction of division of the partition function by $N!$ --- we must do this even in the classical case, otherwise we run into all sorts of problems like the Gibbs paradox. However, when states start to become multiply occupied, this simple prescription fails, and we need to be more sophisticated in our consideration of particle indistinguishability.
If you imagine our gas particles as being wavepackets with a width of $\lambda$ as defined above, then you can think of each particle as occupying a volume $\lambda^3$. This has a nice interpretation --- the quantity $N \lambda^3/V$ that appears in the expression for the chemical potential can be thought of as the fraction of space occupied by the particles. The classical limit corresponds to this quantity being small, so that it's very unlikely for two particles to be in the same place --- i.e., be in the same state (here I'm essentially considering the states of our system to be position eigenstates rather than the usual energy eigenstates). If this quantity becomes larger, we start to get 'multiple-occupation', and so we imagine our classical approximation will break down. This is consistent: when $N \lambda^3 /V \sim 1$, the argument of the logarithm in the chemical potential is no longer large and negative, and so indeed the condition at the very top of this page breaks down.
Hope this helps!