My first comment was more to react to the pathologies arising from the $\beta \rightarrow 0$ limit.
I kind of understand that if instead one considers a case where there is a degeneracy $g = e^{\lambda a}$ with $a > 0$ for a state of energy $\epsilon$ then the rescaled energy becomes:
$u(\beta) = \frac{\varepsilon \: e^{\lambda(a-\beta \varepsilon)}}{1+e^{\lambda(a-\beta \varepsilon)}}$
Taking the limit when $\lambda \rightarrow + \infty$ leads indeed to a case where
$u(\beta) = \varepsilon$ if $\beta \varepsilon < a$
$u(\beta) = 0$ if $\beta \varepsilon > a$
In this particular case, I understand the origin of the transition as there is a balance between entropy gain and energy loss. In the case mentioned in the original question however, there is no reason for the system to be in the high energy state even at infinite temperature.
This kind of transition reminds me of superselective particles one can encounter in drug design for instance.
As an additional note, although the complete symmetry breaking is only observed in the thermodynamic limit, the latter is not required for the high energy state to be more favoured than the low energy one.
In fact, the free energy of the state $u=0$ is simply $F(u=0)=E-TS=0$ while the free energy of the high energy state is simply
$F(u=\varepsilon)= \lambda \varepsilon -T k_B \ln e^{\lambda a} = \lambda(\varepsilon- k_B T a)$
Hence, requiring that $F(u=\varepsilon) < F(u=0)$ requires only that $\beta \varepsilon = a$ irrespective of the value of $\lambda$ and this defines the transition for a finite size system.
Of course, as $\lambda$ grows bigger and bigger, the spread between the two macrostate free energies increases and the high energy state becomes overwhelmly more likely than the low energy one.
If one follows the Ehrenfest classification of phase transitions, the above transition is first order as the mean energy is discontinuous at the transition (let's not focus about definition subtleties involving divergences here which are discussed in the previous link if needed).
Now, if one is more interested in the criticality involved in some phase transitions, then he won't probably find it here as they are principally found in second order phase transitions.
One can understand why this is the case as basically a second order phase transition does not display a jump in the order parameter at the transition. Hence, that is the reason why, at the transition, one can use indefinitely the renormalization group (in the sense of Wilson) on the so called critical manifold.
For a first order phase transition however, the order parameter jumps at the transition and hence, using a renormalization scheme will end up in the dominating phase ($u = 0$ or $u=\varepsilon$) however close we are to the transition.
I'll give a very qualitative answer / overview.
The classification 'first-order phase transition vs. second-order phase transition' is an old one, now replaced by the classification 'first-order phase transition vs. continuous phase transition'. The difference is that the latter includes divergences in 2nd derivatives of $F$ and above - so to answer your question, yes there are other orders of phase transitions, in general.
Note that there are phase transitions that do not fall into the above framework - for example, there are quantum phase transitions, where the source of the phase transitions is not thermal fluctuations but rather quantum fluctuations. And then there are topological phase transitions such as the Kosterlitz–Thouless transition in the XY model.
The framework to understanding the thermal phase transitions is statistical field theory. A very important starting point is Ginzburg theory, and then you upgrade it to Landau-Ginzburg theory. In a nutshell, phases are distinguished by the symmetries they possess. For example, the liquid phase of water is rotationally symmetric and translationally symmetric, but the solid phase (ice) breaks that rotational symmetry because now it only has discrete translational symmetry. So there must be some phase transition between these two phases. Liquid and gas possess the same symmetry and so actually can be identified as the same phase, as evidenced by being able to go from liquid to gas by going around the critical point instead of through the liquid-gas boundary in the phase diagram. LG theory involves writing a statistical field theory of the system respecting symmetries of the system, and then studying how the solution to the field equations respects the symmetry or not against the temperature.
Now we don't really deal with first order phase transitions as much as continuous phase transitions. I can give a few reasons:
First-order phase transitions aren't very interesting. You can model them by Landau-Ginzburg theory in the mean field approach by adding appropriate terms in the effective action (like $m^3$, $m^4, m^5, m^6$, $m$ being the order parameter [yes, note that odd terms are allowed - they explicitly break the symmetry. Although for reasons of positive-definiteness the largest power must be even.]).
First-order phase transitions depend on the microscopic details of the system, so we don't learn much information about such a PT from analyzing one system.
Or perhaps, we just don't know how to really deal with first-order phase transitions very well.
Continuous phase transitions have a diverging correlation length (first order ones typically do not). This implies a few very important things:
a) Microscopic details are washed out because of the diverging correlation length. So we expect continuous phase transitions to be classified into universality classes. By that I mean that near such a critical point, thermodynamic properties diverge with some critical exponents with the order parameter, and these set of critical exponents fall into classes that can be used to classify different PTs. Refer to Peskin and Schroeder pg 450 - we see that the critical point in a binary liquid system has the same set of exponents as that of the $\beta$-brass critical point! And the critical point in the EuO system is the same as the critical point in the Ni system. Interesting, no?
b) We can use established techniques such as renormalization to extract information of the critical exponents of the critical points. Try this paper by Kadanoff.
Ok, so as I said this is a very qualitative answer, but I hope it points you in some (hopefully right) direction.
Best Answer
In a sense you are right. The heat capacity only becomes discontinuous for a system of infinite extent. For all others it is continuous.
But that's a theoretical concern only. At the theoretical point of discontinuity the slope of the heat capacity is infinite. Pick any finite value for the slope, no matter how large and I can find a finite system where the slope is larger than that.
As for "impurities", there is no problem handling multicomponent systems. Chemists do it all the time. The results are only slightly more complex than for single component systems and my remarks about the phase transition above are still true.
If I don't seem to understand your question, please comment.