The ergodic hypothesis is not part of the foundations of statistical mechanics. In fact, it only becomes relevant when you want to use statistical mechanics to make statements about time averages. Without the ergodic hypothesis statistical mechanics makes statements about ensembles, not about one particular system.

To understand this answer you have to understand what a physicist means by an ensemble. It is the same thing as what a mathematician calls a probability space. The “Statistical ensemble” wikipedia article explains the concept quite well. It even has a paragraph explaining the role of the ergodic hypothesis.

The reason why some authors make it look as if the ergodic hypothesis was central to statistical mechanics is that they want to give you a justification for why they are so interested in the microcanonical ensemble. And the reason they give is that the ergodic hypothesis holds for that ensemble when you have a system for which the time it spends in a particular region of the accessible phase space is proportional to the volume of that region. But that is not central to statistical mechanics. Statistical mechanics can be done with other ensembles and furthermore there are other ways to justify the canonical ensemble, for example it is the ensemble that maximises entropy.

A physical theory is only useful if it can be compared to experiments. Statistical mechanics without the ergodic hypothesis, which makes statements only about ensembles, is only useful if you can make measurements on the ensemble. This means that it must be possible to repeat an experiment again and again and the frequency of getting particular members of the ensemble should be determined by the probability distribution of the ensemble that you used as the starting point of your statistical mechanics calculations.

Sometimes however you can only experiment on one single sample from the ensemble. In that case statistical mechanics without an ergodic hypothesis is not very useful because, while it can tell you what a typical sample from the ensemble would look like, you do not know whether your particular sample is typical. This is where the ergodic hypothesis helps. It states that the time average taken in any particular sample is equal to the ensemble average. Statistical mechanics allows you to calculate the ensemble average. If you can make measurements on your one sample over a sufficiently long time you can take the average and compare it to the predicted ensemble average and hence test the theory.

So in many practial applications of statistical mechanics, the ergodic hypothesis is very important, but it is not fundamental to statistical mechanics, only to its application to certain sorts of experiments.

In this answer I took the ergodic hypothesis to be the statement that ensemble averages are equal to time averages. To add to the confusion, some people say that the ergodic hypothesis is the statement that the time a system spends in a region of phase space is proportional to the volume of that region. These two are the same when the ensemble chosen is the microcanonical ensemble.

So, to summarise: the ergodic hypothesis is used in two places:

- To justify the use of the microcanonical ensemble.
- To make predictions about the time average of observables.

Neither is central to statistical mechanics, as 1) statistical mechanics can and is done for other ensembles (for example those determined by stochastic processes) and 2) often one does experiments with many samples from the ensemble rather than with time averages of a single sample.

EDIT: My answer assumes that you're looking for a book at the introductory graduate level.

I found Pathria's "Statistical Mechanics" (2nd ed) very helpful during my first-year graduate statistical mechanics course. Pathria's treatment of the subject is mathematically careful and detailed, at least by physics standards; I found his discussion of Liouville's theorem (part 1 of your question) satisfactory. Unfortunately, like many formal treatments, Pathria discusses few interesting applications.

"Statistical Physics of Particles" by Kardar appears to be supplanting Pathria as the favored introductory graduate text; it was used at Boston University and at Caltech during my time there. Kardar is very terse and would probably have to be supplemented by another book, but the problems he offers are interesting (if hard). In fact, about a third of the text consists of detailed solutions to the problems.

I have heard good things about Reichl's book, already mentioned in another answer. I used it briefly as a reference: the coverage of kinetic theory is more complete than in other sources. It is more accessible than Pathria, not to mention Kardar.

## Best Answer

Thermodynamics today is subsumed as the continuum limit of statistical mechanics. For statistical mechanics, the closest to an axiomatic deduction of the laws is Jaynes's approach, detailed in a series of papers starting in the 1950s. The basic law is that for every conserved quantity, you have a thermodynamic conjugate, and the statistical ensemble is the maximum entropy consistent with the thermodynamic conjugate values, if you don't fix the conserved quantity, or the maximum entropy distribution consistent with the value of the conserved quantity.

The philosophy behind this is that statistical mechanics is really a calculus regarding our knowledge of the microscopic state of a macroscopic body. It is in many ways a rigorous completion of the formalism of 19th century thermodynamics. It has been discussed here before--- you can find three classic reference (freely available--- thank you Physical Review) linked in Jaynes's Wikipedia article