I don't really see the answer in the other answer so let me do the calculation here. Your general Boltzmann Ansatz says that the probability of a state $n$ depends on its energy as
$$ p_n = C \exp(-\beta E_n) $$
where $\beta = 1/kT$. Fermions are identical particles that, for each "box" or one-particle state they can occupy (given e.g. by $nlms$ in the case of the Hydrogen atom-like states), admit either $N=0$ or $N=1$ particles in it. Higher numbers are forbidden by the Pauli exclusion principle. The energies of the multi-particle state with $N=1$ and $N=0$ in a particular one-particle state $nlms$ differ by $\epsilon$. Consequently,
$$ \frac{p_1}{p_0} = \frac{C\exp(-\beta (E+\epsilon))}{\exp(-\beta E)} = \exp(-\beta \epsilon) $$
where I used the Boltzmann distribution. However, the probabilities that the number of particles in the given one-particle state is equal to $N=0$ or $N=1$ must add to one,
$$ p_0 + p_1 = 1.$$
These conditions are obviously solved by
$$ p_0 = \frac{1}{1+\exp(-\beta\epsilon)}, \qquad p_1 = \frac{\exp(-\beta\epsilon)}{1+\exp(-\beta\epsilon)}, $$
which implies that the expectation value of $n$ is equal to the right formula for the Fermi-Dirac distribution:
$$\langle N \rangle = p_0\times 0 + p_1 \times 1 = p_1= \frac{1}{\exp(\beta\epsilon)+1} $$
The calculation for bosons is analogous except that the Pauli exclusion principle doesn't restrict $N$. So the number of particles (indistinguishable bosons) in the given one-particle state may be $N=0,1,2,\dots $. For each such number $N$, we have exactly one distinct state (because we can't distinguish the particles). The probability of each such state is called $p_n$ where $n=0,1,2,\dots$.
We still have
$$\frac{p_{n+1}}{p_n} = \exp(-\beta\epsilon) $$
and
$$ p_0 + p_1 + p_2 + \dots = 1 $$
These conditions are solved by
$$ p_n = \frac{\exp(-n\beta\epsilon)}{1+\exp(-\beta\epsilon)+\exp(-2\beta\epsilon)+\dots } $$
Note that the ratio of the adjacent $p_n$ is what it should be and the denominator was chosen so that all the $p_n$ from $n=0,1,2\dots$ sum up to one.
The expectation value of the number of particles is
$$ \langle N \rangle = p_0 \times 0 + p_1 \times 1 + p_2\times 2 + \dots $$
because the number of particles, an integer, must be weighted by the probability of each such possibility. The denominator is still inherited from the denominator of $p_n$ above; it is equal to a geometric series that sums up to
$$ \frac{1}{1-q} = \frac{1}{1-\exp(-\beta\epsilon)} $$
Don't forget that $1-\exp(-\beta\epsilon)$ is in the denominator of the denominator, so it is effectively in the numerator.
However, the numerator of $\langle N \rangle$ is tougher and contains the extra factor of $n$ in each term. Nevertheless, the sum is analytically calculable:
$$ \sum_{n=0}^\infty n \exp(-n \beta\epsilon) = - \frac{\partial}{\partial (\beta\epsilon)} \sum_{n=0}^\infty \exp(-n \beta\epsilon) =\dots$$
$$\dots = - \frac{\partial}{\partial (\beta\epsilon)} \frac{1}{1-\exp(-\beta\epsilon)} = \frac{\exp(-\beta\epsilon)}{(1-\exp(-\beta\epsilon))^2} $$
This result's denominator has a second power. One of the copies gets cancelled with the denominator before and the result is therefore
$$ \langle N \rangle = \frac{\exp(-\beta\epsilon)}{1-\exp(-\beta\epsilon)} = \frac{1}{\exp(\beta\epsilon)-1} $$
which is the Bose-Einstein distribution.
You could also obtain another version of the Boltzmann distribution for distinguishable particles by a similar calculation. For such particles, $N$ could take the same values as it did for bosons. However, the multiparticle state with $N$ particles in the one-particle state would be degenerate because the particles are distinguishable. There would actually be $N!$ multiparticle states with $N$ particles in them. The sum would yield a Taylor expansion for the same exponential.
Note added later: the derivation above was for $\mu=0$. When the chemical potential is nonzero, all appearances of $\epsilon$ have to be replaced by $(\epsilon-\mu)$. Of course that one may only talk about a well-defined value of $\mu$ when we deal with a grand canonical potential; it is impossible to derive a formula depending on $\mu$ from one that contains no $\mu$ and assumes it's ill-defined. The derivation above was meant to show that the difficult $1/(\exp\pm 1)$ structures do appear from a simpler $\exp(-\beta E)$ Ansatz because I think it's the only nontrivial thing to be shown while discussing the relations between the Boltzmann and BE/FD distributions. If that derivation proves the same link as the textbook does, then I apologize but I think there is "nothing else" of a similar kind to be proven.
On the deepest level, particles are indistinguishable if and only if they have the same quantum numbers (mass, spin, and charges).
However, in statistical mechanics one often studies effective theories where there are additional means of distinguishing particles. Two important examples:
In modeling molecular fluids, two atoms on the same molecule are distinguishable if and only if there is no molecular symmetry interchanging the two atoms, and two atoms in different molecules are distinguishable if and only if there is no congruent matching of the two molecules such that the two atoms correspond to each other.
In modeling the solid state, one typically assumes that the atoms are confined to lattice sites, and that each site is occupied at most once.. In this case, the position in the lattice is a distinguishable label, which makes all atoms distinguishable.
The computational relevance of the distinction is that permutations of (in)distinguishable particles (don't) count towards the weighting factor.
For an expanded discussion see my article at PhysicsForums.
Best Answer
I just remember $$ \frac{1}{\exp(\beta (E-\mu)) \pm 1}$$ You can work out the sign from the fact that Bose-Einstein distributions can diverge (so they go with the - sign), whereas Fermi-Dirac is bounded (so they go with the + sign). Maxwell-Boltzmann applies to classical systems, so quantum statistics don't matter, so take the limit that the two distributions are the same (so drop the $\pm1$).
These expressions represent the average number of particles occupying a state with energy $E$. The chemical potential $\mu$ is just a knob that lets you adjust the overall density. You can also think of it as (roughly) the energy it takes to add a particle to the system. To find the total number of particles in the system you have to sum this over all of the energy levels. You can use this information to find all kinds of thermal averages. For example:
$$ \text{total energy} = \sum_{\text{all energies}} \left(\text{distribution function}\times\text{number of states with a given energy}\right)$$
This is essentially what is going on in Planck's law, only the sum is left off. Planck's law is the Bose-Einstein distribution (with $\mu=0$ because photons can be freely created and destroyed) multiplied by the number of states with an energy in a small range about $E$. This tells you how much energy there is in the photons with energies in that range.