The number of particles with energy $E_1$ is $0\leq m \leq N$.
There are $N_S\choose N$ possible ways to arrange $N$ particles on $N_S$ sites.
There are $N \choose m $ ways to chose $m$ out of $N$ particles to occupy the upper level $E_1$. The other $N-m$ automatically have to occupy the low-energy state $E_0$, so there are no further combinatorics involed.
This should be enough for you to write down the partition function explicitly. (Hint: there's a sum over $m$) If you still can't make progress, I will post a more detailed answer, but I encourage you to try it for yourself first.
Edit As you stated correctly in your orginal question, the energy of a microstate with $m$ particles occupying energy $E_1$ is $$E=mE_1 + (N-m)E_0 = m(E_1-E_0)+NE_0 \equiv m\Delta E+NE_0$$
The Boltzmann-factor of such a microstate is then $$ e^{-\beta E} = e^{-\beta\left(m(E_1-E_0)+NE_0\right)} $$
You might notice, that only one term in the exponent contains $m\,$! The other is a constant.
The binomial factors accounting for the degeneracy in your solution are correct.
/Edit
Edit 2
If i'm not gravely mistaken, your second update looks correct! Now, why the sum over $m$? According to the priciples of statistical mechanics, the canonical partition function is obtained by summing over all possible configurations, weighting with the Boltzmann factor for the corresponding energy.
You could also think of it as summing over all energies and weighting with an additional factor to take the degeneracy of the energy-states into account.
$$ Z = \sum_{\mu_s}e^{-\beta E_s} = \sum_E g_E e^{-\beta E} $$
The coefficient $g_E$ is also called the density of states (DoS) for exactly the reason that it tells you how many microstates correspond to the energy $E$.
In your problem the energy is labeled by $m$, since this is the only variable it depends on. Thus $\sum_E \rightarrow \sum_m $ and the binomials count the degeneracy or more fancy the DoS.
/Edit 2
Let's say one is after the N-body partition-function of a classical system.
$$ Z_N \propto \int d\Omega_N e^{-\beta H} $$
The integral is over the N-body phase-space and $H$ is the full Hamiltonian.
If the system is non-interacting, $H$ is the sum of $N$ one-particle Hamiltonians
$$ H = H_1+H_2+\cdots+H_N $$ each of which only depends on the coordinates and momentum of a single particle.
The phase-space integral then factorizes into one-body integrals
$$ Z_N \propto \left(\int d\Omega_1 e^{-\beta H_1}\right)\left(\int d\Omega_2 e^{-\beta H_2}\right)\cdots\left(\int d\Omega_N e^{-\beta H_N}\right) = \prod_i Z(i)$$
which is just the product of the single-particle partion functions.
If the Hamiltonians are all the same, e.g $H_i=p_i^2/2m$, all s.p partion-functions coincide and you indeed have $$ Z_N \propto Z_1^N $$
Now, I write proportional, because as you have noticed, there's a factor of $1/N!$ missing.
This comes about when you're dealing with indistinguishable particles. If one does not include this factor, the entropy becomes for example non-additive and you're essentially facing Gibb's paradox.
Note that the notion of indistinguishablity is fundamentally a quantum one! Classically such a thing would not be expected, nevertheless this quantum property prevails in the classical limit.
As to the first part of your question: You speak of the probability of 'the particle' having spin up. I'm not sure that's a sensible question in the context of many bodies. Do you mean the probability of finding exactly one particle in the up-state? If so, think about the energy of such a state and use the definition of $P_i$ you gave.
Best Answer
My favorite way to obtain the canonical partition function is via quantum statistical mechanics and involves essentially only one principle: maximum entropy. The principle says that to obtain the statistical state of a system in a certain ensemble, one extremizes the entropy subject to the constraints that define the ensemble.
In the context of quantum statistical mechanics for a system in the canonical ensemble, one extremizes the so-called von-Neumann entropy $$ S_\mathrm{vn}(\rho) = -k\,\mathrm{tr}(\rho\ln\rho) $$ subject to the constraint that the ensemble average energy has some fixed value $E$; $$ \mathrm{tr}(\rho H) = E $$ Here $\rho$ denotes the density operator of the system, and $H$ is its Hamiltonian. This constraint is, in fact, one way of defining the canonical ensemble. This is a constrained optimization problem that can be solved using the method of Lagrange multipliers. The result is that the density operator of the system is. $$ \rho = \frac{1}{Z}e^{-\beta H}, \qquad Z = \mathrm{tr}(e^{-\beta H}) $$ where $\beta$ is the Lagrange multiplier corrsponding to the constraint of fixed ensemble average energy.
Important Digression. If you use the derivation above, it's not at all clear a priori why the multiplier $\beta$ is inverse temperature. The multiplier $\beta$ can be identified with inverse temperature using the following argument. Notice that for $\rho$ of the canonical ensemble, we have \begin{align} S_\mathrm{vn}(\rho) &= -k\mathrm{tr}\left(\rho\ln \frac{e^{-\beta H}}{Z}\right)\\ &= -k\mathrm{tr}\left(\rho(\ln e^{-\beta H}-\ln Z)\right)\\ &=k\mathrm{tr}\left(\rho(\beta H+\ln Z)\right)\\ &= k(\beta \mathrm{tr}(\rho H) + \ln Z)\\ &= k(\beta E + \ln Z) \end{align} Now, notice that the Lagrange multiplier is actually a function of $E$, the constrained value of the ensemble average of $H$, so we have $$ \frac{\partial S_\mathrm{vn}}{\partial E} = k\left(\beta ' E + \beta + \frac{1}{Z}\frac{\partial Z}{\partial E}\right) $$ where $\beta'$ denotes the derivative of $\beta$ with respect to $E$, but \begin{align} \frac{1}{Z}\frac{\partial Z}{\partial E} &= \frac{1}{Z}\frac{\partial}{\partial E} \mathrm{tr}(e^{-\beta H}) = \mathrm{tr}(-\beta'He^{-\beta H}/Z) = -\beta'\mathrm{tr}(\rho H) = -\beta' E \end{align} so that putting this all together, we get $$ \frac{\partial S_\mathrm{vn}}{\partial E} = k\beta $$ on the other hand, recall that the thermodynamic temperature satisfies $$ \frac{1}{T} = \frac{\partial S}{\partial E} $$ so that if we identify the von-Neumann entropy with the thermodynamic entropy ($S = S_\mathrm{vn}$) gives $$ \beta = \frac{1}{kT} $$ as desired.