Why do we study these different ensembles, microcanonical, canonical, grand canonical ensemble ? Are they used for studying different physical system or scenarios?(e.g. in some system you can only treat it as mirocanonical, and in other cases you can only apply canonical ensemble) Do they have the same result at thermodynamic limit?
[Physics] Why do we need different ensembles in statistical mechanics
statistical mechanics
Related Solutions
I don't really see the answer in the other answer so let me do the calculation here. Your general Boltzmann Ansatz says that the probability of a state $n$ depends on its energy as $$ p_n = C \exp(-\beta E_n) $$ where $\beta = 1/kT$. Fermions are identical particles that, for each "box" or one-particle state they can occupy (given e.g. by $nlms$ in the case of the Hydrogen atom-like states), admit either $N=0$ or $N=1$ particles in it. Higher numbers are forbidden by the Pauli exclusion principle. The energies of the multi-particle state with $N=1$ and $N=0$ in a particular one-particle state $nlms$ differ by $\epsilon$. Consequently, $$ \frac{p_1}{p_0} = \frac{C\exp(-\beta (E+\epsilon))}{\exp(-\beta E)} = \exp(-\beta \epsilon) $$ where I used the Boltzmann distribution. However, the probabilities that the number of particles in the given one-particle state is equal to $N=0$ or $N=1$ must add to one, $$ p_0 + p_1 = 1.$$ These conditions are obviously solved by $$ p_0 = \frac{1}{1+\exp(-\beta\epsilon)}, \qquad p_1 = \frac{\exp(-\beta\epsilon)}{1+\exp(-\beta\epsilon)}, $$ which implies that the expectation value of $n$ is equal to the right formula for the Fermi-Dirac distribution: $$\langle N \rangle = p_0\times 0 + p_1 \times 1 = p_1= \frac{1}{\exp(\beta\epsilon)+1} $$ The calculation for bosons is analogous except that the Pauli exclusion principle doesn't restrict $N$. So the number of particles (indistinguishable bosons) in the given one-particle state may be $N=0,1,2,\dots $. For each such number $N$, we have exactly one distinct state (because we can't distinguish the particles). The probability of each such state is called $p_n$ where $n=0,1,2,\dots$.
We still have $$\frac{p_{n+1}}{p_n} = \exp(-\beta\epsilon) $$ and $$ p_0 + p_1 + p_2 + \dots = 1 $$ These conditions are solved by $$ p_n = \frac{\exp(-n\beta\epsilon)}{1+\exp(-\beta\epsilon)+\exp(-2\beta\epsilon)+\dots } $$ Note that the ratio of the adjacent $p_n$ is what it should be and the denominator was chosen so that all the $p_n$ from $n=0,1,2\dots$ sum up to one.
The expectation value of the number of particles is $$ \langle N \rangle = p_0 \times 0 + p_1 \times 1 + p_2\times 2 + \dots $$ because the number of particles, an integer, must be weighted by the probability of each such possibility. The denominator is still inherited from the denominator of $p_n$ above; it is equal to a geometric series that sums up to $$ \frac{1}{1-q} = \frac{1}{1-\exp(-\beta\epsilon)} $$ Don't forget that $1-\exp(-\beta\epsilon)$ is in the denominator of the denominator, so it is effectively in the numerator.
However, the numerator of $\langle N \rangle$ is tougher and contains the extra factor of $n$ in each term. Nevertheless, the sum is analytically calculable: $$ \sum_{n=0}^\infty n \exp(-n \beta\epsilon) = - \frac{\partial}{\partial (\beta\epsilon)} \sum_{n=0}^\infty \exp(-n \beta\epsilon) =\dots$$ $$\dots = - \frac{\partial}{\partial (\beta\epsilon)} \frac{1}{1-\exp(-\beta\epsilon)} = \frac{\exp(-\beta\epsilon)}{(1-\exp(-\beta\epsilon))^2} $$ This result's denominator has a second power. One of the copies gets cancelled with the denominator before and the result is therefore $$ \langle N \rangle = \frac{\exp(-\beta\epsilon)}{1-\exp(-\beta\epsilon)} = \frac{1}{\exp(\beta\epsilon)-1} $$ which is the Bose-Einstein distribution.
You could also obtain another version of the Boltzmann distribution for distinguishable particles by a similar calculation. For such particles, $N$ could take the same values as it did for bosons. However, the multiparticle state with $N$ particles in the one-particle state would be degenerate because the particles are distinguishable. There would actually be $N!$ multiparticle states with $N$ particles in them. The sum would yield a Taylor expansion for the same exponential.
Note added later: the derivation above was for $\mu=0$. When the chemical potential is nonzero, all appearances of $\epsilon$ have to be replaced by $(\epsilon-\mu)$. Of course that one may only talk about a well-defined value of $\mu$ when we deal with a grand canonical potential; it is impossible to derive a formula depending on $\mu$ from one that contains no $\mu$ and assumes it's ill-defined. The derivation above was meant to show that the difficult $1/(\exp\pm 1)$ structures do appear from a simpler $\exp(-\beta E)$ Ansatz because I think it's the only nontrivial thing to be shown while discussing the relations between the Boltzmann and BE/FD distributions. If that derivation proves the same link as the textbook does, then I apologize but I think there is "nothing else" of a similar kind to be proven.
Your second point, which is the most important one I think, is right but is not so problematic I think. You make a point about temperature but the same thing could be said of the density. You can consider a gas (ideal gas to make it simple) in either microcanonical or canonical ensembles and find that if you partition the box into two halves, the 1-particle density on each side is not necessarily the same and becomes exactly the same in the thermodynamic limit only.
Note also that, although the density need not be uniform, the most likely macrostate characterised by the number of particles in one of the halves corresponds to the case where the density is the same in the two parts of the box.
What you describe is exactly the same thing but with the energy instead of the number of particles for a gas.
Now, some people have tried to understand deeply what statistical mechanics is about after Gibbs and have come with some original and important ideas, among them you will find:
Khinchin on a mathematical formulation of statistical mechanics
Jaynes on a statistical inference interpretation of statistical mechanics
Fermi with the Pasta-Ula-Fermi problem
Kolmogorov-Arnold and Moser for the KAM theorem
Oliver Penrose (brother of Roger Penrose) has devised his own theory to give a rational to statistical mechanics
Roger Balian for continuing the work of Jaynes and extending it to quantum systems
Vulpiani on the relation between deterministic chaos and statistical mechanics
Lawrence Sklar on the philosophical issues of the fundations of statistical mechanics
This is not an exhaustive list but these are the authors that really made me change my mind on many misconceptions I had on statistical mechanics.
Best Answer
If somebody tells you what the entropy is as a function of energy, volume, and number of particles, you have all the information you need (for a standard plain vanilla system). It is not necessary to define any other ensemble, but it is convenient. If your system for instance is in contact with a big other system ("reservoir") with which it can exchange energy, then you can either describe system plus reservoir microcanonically, or you describe only your system canonically. The latter is clearly more convenient, since you need not bother about the internal "workings" of the reservoir. For the purpose of your problem the entire reservoir is perfectly well characterized by a single number: its temperature.
The mathematical machinery of Legendre transforms provides a neat way to change from a thermodynamic potential (such as the entropy) to other potentials in which derivatives of the original thermodynamic potential become the new variables, and this transformation is being done without losing information. So, at the end of the day, this is just mathematical convenience: represent the necessary thermodynamic information in ways that are easier to handle in a given situation characterized by a particular set of constraints.