Okay, this is actually pretty straightforward, but I don't know where to start.
Review: What's a partition function?
Let's step back and derive what we're talking about: what is a partition function? So we have a system which takes on a set of energy levels with degeneracies $\left \{(E_i, g_i) \right \}.$
We know that your system $s$ is in contact with a reservoir $r$, but together they are sealed up in a microcanonical ensemble with $S = S_s + S_r$, $U = U_s + U_r$. Now that reservoir is big and complicated so its internal degrees of freedom over the (to it) smallish changes in $U_r$ can be linearized as $S_r(U - U_s) = S_r(U) - U_s/T$, where $T$ is its (effectively constant) thermodynamic temperature $T^{-1} = \left(\frac{\partial S_r}{\partial U_r}\right)_{N_r,~V_r}$. Therefore the overall entropy of the reservoir system in state $i$ is $S_r(i) = S_0 - E_i/T$ for some $S_0$. But we know that the definition of entropy is $S = k_B \ln W$ where $W$ is a multiplicity of the state, so counting in the degeneracy, the total multiplicity of the state is simply:
$$W_i = g_i ~ W_r(i) = g_i ~ e^{S_0/k_B - E_i / (k_B T)} $$and the probability is therefore $$p_i = \frac{W_i}{\sum_k W_k} =\frac {g_i ~ e^{-E_i / (k_B T)}} Z$$ for some constant $Z$ independent of the state index $i$, incorporating both $\sum_k W_k$ and $e^{S_0/k_B}$. Since the probabilities sum to 1, we can say that:$$Z = \sum_i g_i ~ \exp\left(\frac{-E_i} {k_B T}\right). $$If the system is continuous then we need a density-of-states $g(E)$ so that the number of states with energies between $E$ and $E + dE$ is roughly $g(E) ~ dE$, then we convert the above to an integral.
From particles to complex systems
Okay, now that we're both on the same page about what it is, what happens if your system has a bunch of parts? Then each $i$ now labels a configuration of the parts. It potentially gets complicated! The first easy thing to do is to ditch the degeneracies $g_i$ and instead store all of their energies in a multiset: this is a set which can hold the same number multiple times. That might be confusing so let's procede formally a different way.
Let's talk now about a set $C = \left\{c_i\right\}$ where the $c_i$ is some mathematical object telling me the configuration of the state $i$, and we'll assume that this is distinct for each $i$, and now we have to transition from a set of $E_i$ to a function $E(c_i)$ which gives the energy of a configuration of the parts. As a side effect now $g_i = 1$ for each $i$ since each configuration is treated independently, but the same result holds:$$Z = \sum_i \exp\left(\frac{-E(c_i)} {k_B T}\right).$$
Non-interacting systems
If you're with me so far, there's just one more step! What is the form of $c_i$ and $E(c)$?
Well for a system of $N$ identical noninteracting particles, we have the single-particle energies $E_i$ from before, and the total energy is the sum of the energies for which the states are occupied. That is, the ideal form for $c_i$ becomes an occupation function, $c_i = \left\{n_{i,k}\right\}$ which tells us, in configuration $i$, how many particles are in the state with energy $E_k$. Then the energy of the state is:$$E(c_i) = \sum_k n_{i,k} ~ E_k,$$hence,$$Z = \sum_i \exp\left(\frac{-\sum_k n_{i,k} ~ E_k} {k_B T}\right)$$So that is where the sum up-top comes from: we now have a complicated multi-particle state but as long as the particles themselves are noninteracting we can use the sum of single-particle energies to get the overall energy.
I understand this to be the sum of all the particles, each being in a certain energy state $E_i$
This is not correct. The sum is taken over all possible microstates of the system, with $E_i$ being the energy of microstate $i$. For example, if my system has three possible states with energies $E_1,E_2,E_3$, then the partition function would be $Z=e^{-\beta E_1} + e^{-\beta E_2} + e^{-\beta E_3}$. If $E_2=E_3= E'$, then the partition function would be $Z=e^{-\beta E_1} + e^{-\beta E'} + e^{-\beta E'} = e^{-\beta E_1} + 2e^{-\beta E'}$. If two different microstates have the same energy $E'$, then $e^{-\beta E'}$ appears in the partition function sum twice - we're summing over states, not energies.
The idea behind this is as follows. Consider a system in thermal equilibrium with a large reservoir. If the state $i$ has energy $E_i$, then the probability $P_i$ that the system is in state $i$ is $P_i = Ce^{-\beta E_i}$ for some constant of proportionality $C$.
If we sum the probabilities over all possible states, then the result must of course be equal to 1; that means that
$$\sum_{i\in\text{States}} Ce^{-\beta E_i} = C \sum_{i\in \text{States}} e^{-\beta E_i} = 1 \implies C = \frac{1}{\sum_{i\in\text{States}} e^{-\beta E_i}}$$
Defining the partition function
$$Z := \sum_{i\in\text{States}} e^{-\beta E_i}$$
we then have that the probability that the particle is in state $i$ is simply
$$P_i = \frac{e^{-\beta E_i}}{Z}$$
Having gotten that out of the way, I think there are two conceptual problems that you're running into:
- How do we handle this in classical physics, where states aren't discrete (and therefore can't simply be added up)?
- How does the number of particles in the system enter into this conversation?
In classical physics (in particular, the Hamiltonian formulation of mechanics), the state of a particle (moving in $3$ dimensions) corresponds to a point in $6$-dimensional phase space $(\vec q,\vec p)$. The state of a system of two particles moving in $3$ dimensions corresponds to a point in $12$-dimensional phase space $(\vec q_1, \vec q_2, \vec p_1, \vec p_2)$.
In general, if the system has $N$ particles, then the state of the system corresponds to a point in $6N-$dimensional phase space $(\vec q_1,\ldots, \vec q_N,\vec p_1, \ldots, \vec p_N)$.
If our system corresponds to a Hamiltonian
$$\mathcal H = \sum_{i=1}^N \frac{|\vec p_i|^2}{2m} + U(\vec q_1,\ldots, \vec q_N)$$
then the Boltzmann factor becomes
$$e^{-\beta \mathcal H} = e^{-\beta \left(\sum_{i=1}^N \frac{|\vec p_i|^2}{2m}\right)} \cdot e^{-\beta U(\vec q_1,\ldots, \vec q_N)}$$
Summing over all states amounts to integrating over all points in the $6N-$dimensional phase space, which means that
$$Z = \sum_{\text{States}} e^{-\beta H} \rightarrow \frac{1}{h^{3N}}\int d\vec q_1 \ldots \int d\vec q_N \int d\vec p_1 \ldots \int d\vec p_N e^{-\beta \left(\sum_{i=1}^N \frac{|\vec p_i|^2}{2m}\right)} \cdot e^{-\beta U(\vec q_1,\ldots, \vec q_N)}$$
where $h$ is some number with dimensions of momentum $\times$ length, so as to make the integral dimensionless (a common choice is Planck's constant, but this factor drops out from all physical predictions so it doesn't actually matter).
This looks nightmarish. However, exponentials have the beautiful property that $e^{x+y}=e^x e^y$; it follows that
$$Z = \frac{1}{h^{3N}} \left(\int d\vec p_1 e^{-\frac{\beta |\vec p_1|^2}{2m}}\right)\ldots \left(\int d\vec p_N e^{-\frac{\beta |\vec p_N|^2}{2m}}\right) \times \int d\vec q_1 \ldots \int d\vec q_N e^{-\beta U(\vec q_1,\ldots,\vec q_N)}$$
But of course each of those momentum integrals is the same, which means that
$$Z = \frac{1}{h^{3N}}\left(\int d\vec p e^{-\frac{\beta |\vec p|^2}{2m}}\right)^N \times \int d\vec q_1 \ldots \int d\vec q_N e^{-\beta U(\vec q_1,\ldots,\vec q_N)}$$
The first term is just a Gaussian integral, which is easy to evaluate. The second term is still generically nightmarish. However, if it turns out that the interparticle forces can be neglected (e.g. any potential is an external one, like an external gravitational field), then $U(\vec q_1,\ldots,\vec q_N) =\sum_{i=1}^N u(\vec q_i)$, where $u$ is the external potential. This means that the potential term can also be factored, yielding
$$Z = \frac{1}{h^{3N}} \left(\int d\vec q \int d\vec p e^{-\beta\left(\frac{|\vec p|^2}{2m} + u(\vec q)\right)}\right)^N$$
In such cases, it is reasonable to refer to the "single-particle partition function"
$$Z_1 = \frac{1}{h^3} \int d\vec q \int d\vec p e^{-\beta\left(\frac{|\vec p|^2}{2m}+u(\vec q)\right)}$$
and to simply say that $Z = Z_1^N$. Again, though, this only works if the potential energy term can be factorized, which requires either a non-interacting system or one which is subject to considerable simplifying assumptions.
Best Answer
I prefer to see it in the following way:
\begin{equation} Z_{quantum} \equiv \sum_{m} e^{-\beta E_m} \end{equation}
Where $m$ is a quantum microstate eigenstate of the Hamiltonian. Now, you can split the sum into two parts; a sum over quantum microstates that yields the same energy eigenvalue $E_n$ and a sum over all possible values of $E_n$: \begin{equation} Z_{quantum} = \sum_{n = 0}^{\infty} e^{-\beta E_n} \sum_{m \rightarrow E_n} 1 = \sum_{n = 0}^{\infty} g(E_n) e^{-\beta E_n} \end{equation}
where $n$ labels the different levels of the energy spectrum of the system and $g(E_n)$ is the degeneracy of a given level $n$. When the energy spectrum is evenly spaced, it is quite easy to see what happens: $g(E_n)$ is an increasing function of $n$ and in the thermodynamic limit we can assume that: \begin{equation} g(E_n) = e^{S(E_n)/k_B} \end{equation} At the end of the day, the sum looks like: \begin{equation} Z_{quantum} = \sum_{n=0}^{\infty} e^{-\beta F(E_n)} \end{equation}
where $F(E_n) = E_n - k_B T S(E_n)$. Since $E_n$ is increasing with $n$ by construction and $S(E_n)$ is an increasing function of $n$ as well, $F(E_n)$ is ensured to have a minimum at some value $n^*$. At $T=0$, the entropy has no weight and the most probable energy is the ground state $n^* = 0$. As $T$ is increased, $n^*$ shifts towards bigger values of $n$. At very high temperature, $n^* \gg 1$ and one can start to think that some approximation may hold.
The idea consists in saying that the partition function is mostly dominated by those states that are in the "neighbourhood" of $n^*$.
Apart from the fact that it is known that at high energies, quantum systems are well described by classical models (coherent states for harmonic oscillator, classical kinetic energy for a particle in a box, Rhydberg states in atoms etc...) one can try something like: \begin{equation} Z_{quatum} \sim \sum_{n \sim n^*} e^{-\beta F(E_n)} \end{equation}Now, since $F(E_n)$ is at an extremum at $n^*$ then it varies very slowly in its neighbourhood which is a a good thing if want to approximate the sum by an integral. To see this, let us consider the term $e^{-\beta F(E_{n+1})} = g(E_{n+1})e^{-\beta E_{n+1}}$. The degeneracy factor $g(E_n)$ is a number associated to $n$ and could also have been called $g_n$. Let us imagine for a minute that $g(E_n)$ is nothing but a continuous function $\Omega(E)$ evaluated at spectrum values $E_n$. We see that imagining such a thing can be helpful because if the energy spacing $\delta E \ll E_n$ then $g(E_{n+1}) = \Omega(E_n+\Delta E) = \Omega(E_n)+\Omega'(E_n)\delta E + \mathcal{O}(\delta E^2)$. Now, $e^{-\beta E_{n+1}}=e^{-\beta E_n}e^{-\beta \delta E} = e^{-\beta E_n}\left(1-\beta \delta E + \mathcal{O}(\delta E^2) \right)$. At the end of the day: \begin{equation} e^{-\beta E_{n+1}} = \Omega(E_n)e^{-\beta E_n}\left(1 + (\beta_n-\beta)\delta E \right) + \mathcal{O}(\delta E^2) \end{equation}where $\beta_n \equiv \partial S/k_B \partial E_n$ is the inverse temperature associated with a system at energy $E_n$ and where I used the fact that $\Omega'(E_n)=\Omega(E_n)\beta_n $. In the thermodynamic limit, as long as the energy levels are in the neighbourhood of $n^*$, $\beta_n$ is very close to $\beta$ and thus, as I qualitatively said earlier, the free energy $F(E_n)$ varies very very slowly. The idea is then to partition the neighbourhood in $m$ lumps $\{\mathcal{L}_{i}\}_{i=1..m}$ of size $\Delta n$ where $F(E_n)$ does not vary. It gives: \begin{equation} Z_{quantum} \approx \sum_{i =1 }^{m} \sum_{n \in \mathcal{L}_i} \Omega(E_n) e^{-\beta E_n} = \sum_{i =1 }^{m} \Omega(E_i) e^{-\beta E_i} \Delta n \end{equation}Let us define then $\rho(E_i)\Delta E \equiv \Omega(E_i)\Delta n$ it finally yields: \begin{equation} Z_{quantum} \approx \sum_{i =1 }^{m} \rho(E_i) e^{-\beta E_i} \Delta E \approx \int_0^{+\infty} dE\:\rho(E)e^{-\beta E} \end{equation}
As I said in the comments, the continuous function $\Omega(E)$ and therefore $\rho(E)$ can be determined using a classical integral over phase space.
With the above derivation, we see that the criterion to be in the classical limit is $\delta E \ll E_n$ (for distinguishable particles). If the system is separable into independent hamiltonians then, this criterion becomes essentially $\delta E \ll k_B T$.
This is nice but unfortunately, it means that one has to compute the full quantum energy spectrum of one particle in the most compliant case (the separable case).
Recently, I have tried to come up with a heuristic way (in the separable case) of getting validity criteria of the classical regime from classical quantities but knowing that the system is genuinely quantum.
The idea is as follows.
The quantum partition function of a particle in a potential $U(\mathbf{x})$ is: \begin{equation} z_{quantum} = {\rm Tr}\left(e^{-\beta (\hat{K}+U(\hat{\mathbf{x}}))} \right) \end{equation}where $\hat{K}$ is the kinetic energy operator. The quantum character of the partition function is embedded in the non commutativity of position and momentum operator that implies that $[\hat{K},U(\hat{\mathbf{x}})] \neq 0$ is not zero. To see that, one can use as a first approximation Glauber's formula (that is a truncated version of the exact Zassenhaus formula): \begin{equation} e^{-\beta (\hat{K}+U(\hat{\mathbf{x}}))} \approx e^{-\beta \hat{K}}e^{-\beta U(\hat{\mathbf{x}})}e^{\frac{\beta^2}{2}[\hat{K},U(\hat{\mathbf{x}})]} \end{equation}If $\beta^2 [\hat{K},U(\hat{\mathbf{x}})] \rightarrow 0$ then, quantum correlations between momenta and positions are effectively vanishing and we get the classical partition function. To see that, the idea is that trace can be computed on any basis set and we choose the basis $\{| \hat{\mathbf{x}} \rangle \}$ we then have: \begin{equation} Tr\left[e^{-\beta \hat{K}} e^{ -\beta U(\hat{\mathbf{x}})} \right] = \int \:d\mathbf{x}\: \langle \mathbf{x} |e^{-\beta \hat{K}} e^{-\beta U(\hat{\mathbf{x}}} |\mathbf{x} \rangle \end{equation}then insert the identity $\mathbf{1} = \int \:d\mathbf{p}\:|\mathbf{p} \rangle \langle \mathbf{p}|$ in between the two exponential functions that yields for the partition function $z_{quantum}$: \begin{eqnarray} z &\approx& \int \:d\mathbf{x}\: \langle \mathbf{x} |e^{-\beta \hat{K}}| \int \:d \mathbf{p}\: | \mathbf{p} \rangle \langle \mathbf{p} |e^{ -\beta U(\hat{\mathbf{x}})} |\mathbf{x} \rangle \nonumber \\ &\approx& \int\:d\mathbf{x} d\mathbf{p}\: |\langle \mathbf{x}| \mathbf{p}\rangle|^2 e^{-\beta K} e^{ -\beta U(\mathbf{x})} \end{eqnarray}By using the fact that $\langle \mathbf{x}| \mathbf{p}\rangle$ is a plane wave with some constant amplitude $C$, we finally get the classical result: \begin{equation} z \approx C^2 \:\int\:d\mathbf{x} d\mathbf{p}\: e^{-\beta( K+U(\mathbf{x}))} \end{equation}
At the end of the day, one has to show that "typical values" of $\beta^2 [\hat{K},U(\hat{\mathbf{x}})]$ are small for the classical limit to hold. One way to so is to use the Robertson inequality that says that for any quantum state $|\psi \rangle$, one has: \begin{equation} \beta^2 \sigma_{\hat{K}}(\psi)\sigma_{U(\hat{\mathbf{x}})}(\psi) \geqslant \frac{1}{2}|\langle \psi |\beta^2[\hat{K},U(\hat{\mathbf{x}})] | \psi \rangle| \end{equation}The problem is now to estimate quantum fluctuations of the kinetic and potential operators. First, if we look at eigenstates of the hamiltonian then, because the energy is constant, then it means that $ \sigma_{\hat{K}}(\psi) \approx \sigma_{U(\hat{\mathbf{x}})}(\psi)$. We thus need a way to estimate kinetic energy fluctuations and this is where it becomes hand-waving again for now. One way to it, I think is to invoke the time-energy Heisenberg relation $\sigma_{\hat{K}}(\psi) \Delta t \geqslant \hbar$. I think that $\Delta t$ is a feature of the system one is looking at and in particular should be related to the potential $U$. Let us call $l$ the typical length scale characterizing the potential $U$ and $v$ a typical value of its speed. We use the fact now that a typical eigenstate will be around the energy level $n^*$ which most likely imply that $v \sim \sqrt{k_B T/m}$. We thus estimate $\Delta t \sim v/l \sim l \sqrt{m/k_B T}$. The most quantum case (called saturation of Heisenberg inequalities) is when $\sigma_{\hat{K}}(\psi) = \hbar/\Delta t \sim \hbar \sqrt{k_B T/ml^2}$. At the end of the day the full criterion is now: \begin{equation} \beta^2 \sigma_{\hat{K}}(\psi)\sigma_{U(\hat{\mathbf{x}})}(\psi) \sim \beta^2 \frac{\hbar^2 k_B T}{ml^2}\sim \frac{\hbar^2 }{k_B T ml^2}\ll 1 \end{equation}In the case of an ideal gas, the distance $l$ is nothing but the size of the box $L$ and one finds $L >> \lambda_{beta}$ where I used the notation of Peter here. This formula applies well to all the cases I know but is not very rigorous I admit and again any advice to render it more rigorous is very much welcome.