We -first- exhaust the chemical potential, and -then- start occupy the lowest state. What is the reasoning for this?
That "first"/"then" isn't the logic used in deriving the Bose-Einstein condensate, but it is an intuitive picture of what's going on.
So, we have for our grand canonical potential something like: $\Omega=T \sum_{i} \log(1-e^{\beta(\mu-\varepsilon_i)})$, and we can find the number of particles through $N=-\frac{\partial \Omega}{\partial \mu}$. Those are exact, true equations of quantum statistical mechanics. Here is where your confusion lies:
We approximate $\Omega=T \int \nu(\varepsilon)\log(1-e^{\beta(\mu-\varepsilon)})d\varepsilon$, where $\nu$ is the density of states. This is a good approximation in the Fermi gas, where the low energy states are filled but only by a few out of $N\approx 10^{23}$ particles. This is a good approximation for an ideal gas, where $T$ is large and most occupation is in higher energy states. But this is a terrible approximation for BECs because the behavior of the low energy states isn't captured at all by the integral!
So what are our options now? We probably don't want to work directly with the sum, but we can't just take an integral. Some books like to pull out the ground state term and approximate the rest with an integral, and I'll do that here.
$$\Omega=T \log(1-e^{\beta \mu})+\int \nu(\varepsilon)\log(1-e^{\beta(\mu-\varepsilon)})d\varepsilon$$
This gives... (denote $z=e^{\beta \mu}$)
$$N=\frac{z}{1-z}+\int \nu(\varepsilon)\frac{1}{z^{-1} e^{\beta \varepsilon}-1}d\varepsilon$$
For low temperature, the rightmost term is basically constant and the leftmost one varies dramatically ($\frac{z}{1-z}\to\infty$ as $\mu \to 0$), allowing you to match any desired value of $N$.
Now, you could ask the question: At a finite temperature $T$ with particle number $N$, if I add one more to $N$, how much is split between the ground state and how much is split between the excited states? Well, if $|\mu|$ is small (this is equivalent to $T<T_c$), $z\approx 1-\beta |\mu|$, $\nu=C \varepsilon^{1/2}$, and $\int \frac{\varepsilon^{1/2}}{z^{-1}e^{\beta\varepsilon}-1}d\varepsilon=T^{3/2} \Gamma(3/2) \mathrm{Li}_{3/2}(z)$. Mathematica tells me $\mathrm{Li}_{3/2}(1-\beta |\mu|)\approx \zeta(3/2)-2\sqrt{\pi \beta |\mu|}+O(\beta \mu)$, so overall:
\begin{align*}
N&\approx\frac{z}{1-z}+\int \nu(\varepsilon)\frac{1}{z^{-1} e^{\beta \varepsilon}-1}d\varepsilon\\
&= \frac{1}{\beta |\mu|}-1+C T^{3/2} \Gamma(3/2)\left( \zeta(3/2)-2\sqrt{\pi \beta |\mu|} \right)+O(\mu)
\end{align*}
For large $N$ and small $\mu$, the contribution of the first term, $\frac{1}{\mu}$, dominates by far! It tells you that "as $N\to \infty$, the ground state starts to hold almost all the states".
The contribution of the proceeding terms include a constant term, which is the notion that you fill excited states first, and a correction to the notion, which says you leave a tiny bit of the excited states unfilled, proportional to $\sqrt{|\mu|}$.
You can go on and on with this stuff at varying degrees of rigor. Pathria Statistical Mechanics appendix F does so in a way that doesn't pull the first term out of the integral, but instead works with the sums directly by writing $\varepsilon$ in terms of the energy levels of a particle in a 3D box.
[edit/footnote]: note that the integrals I take are from $\varepsilon=0$ to $\varepsilon=\infty$. I assume the ground state to have zero energy! This is fine, because all we care about is the difference $\mu-\varepsilon_i$. I can shift $\varepsilon$ by whatever energy I want, so long as I shift $\mu$ correspondingly to keep the difference the same. The choice $\varepsilon_0=0$ gives the usual $\mu\to 0$ as $T\to 0$ law, but if we shifted everything to $\varepsilon_0=10 \mathrm{ Joules}$ or whatever, you would certainly have $\mu\to 10 \mathrm{ Joules}$ as $T\to 0$. The equations would be a lot uglier but the physics would be the same.
Best Answer
Fermions are indistinguishable particles, hence you cannot distibute the three fermions in different ways. The state is completely described by the occupations numbers $|1,2\rangle$ and has zero entropy. If you are talking about a Fermi system at a given temperature, that means it is (or was) in contact with the external world at that temperature (canonical or grand canonical ensemble). The state of such system is described by a density matrix. In equilibrium it is given by $$\rho = \frac{e^{-\beta H}}{Tr \, e^{-\beta H}}.$$ In the energy eigenbasis, the numerator reads $$\sum_n e^{-\beta E_n} \, |E_n\rangle\langle E_n|. $$ In the limit $T\rightarrow 0$, $\rho$ approaches $|E_0\rangle \langle E_0| $. This density matrix describes a quantum mechanical pure state, which always has zero entropy.