[Physics] Axioms behind entropy!

condensed-matterdefinitionentropystatistical mechanicsthermodynamics

The concept of entropy is very ubiquitous, we learn about its uses starting from Information Theory (Shannon entropy) up to its basic definition in statistical mechanics in terms of number of micro-states.

Limiting the discussion to physics, when studying a physical system, can be a box filled with an ideal gas, a melt of polymers or the state of rods/molecules in a liquid crystalline system, in all such scenarios, there are specific entropic terms that we define in describing the evolution of the system (by including it in the free energy expression).

From a statistical mechanics point of view, we use Boltzmann's definition of: $$S=k_B \ln\Omega$$
where $\Omega$, can be the partition function in an ensemble, or simply the number of microstates within a given macrostate. But of course we almost never use this exact form of entropy in studying real systems, as it is impossible to count the microstates by any means. Instead we define entropic terms based on macroscopic variables of a system, like the followings (examples among the usual ones):

  • For a perfect gas one can write the entropy per atom of N atoms in volume V as: $$S_{\rm ideal}=k_B \ln\left(a\frac{V}{N}\right)$$
    with $a$ a constant.

  • In soft condensed matter, studying liquid crystallinity, often we define an orientational entropy, describing the entropy lost when molecules are oriented. In its most general form defined as: $$S_{\rm orient}=-k_B \int f(\theta)\ln f(\theta)d\Omega$$ where $f(\theta)$ is an orientation distribution function and $d\Omega$ a small solid angle.

  • In polymers physics, often there are entropic terms attributed to homogeneity in number density distribution of monomers $n(i)$ along the chains (often called Lifshitz entropy), described as $$S_{\rm homogeneity}\propto -\left(\nabla \sqrt{n(i)}\right)^2$$ which is just the gradient of a density distribution.

In all such cases, it is relatively straightforward to see as to why we refer to such state functions as "entropic", as they all boil down to describing a certain kind of disorder in the system, and how their inclusion would effect the equilibrium state of a system. But the underlying question is about, how we give physical and mathematical bounds to these entropic terms must be consistent, hence there must be a set of axioms that a function must fulfill in order to qualify as an entropic term.

To elaborate on the question:

On the one hand, from a physical point of view, the entropy is always set to reach its maximum at the most disordered state, and a minimum for the most ordered case (i.e. 1 possible micro-state, complete certainty over its state). For example in liquid crystals again, we want $S_{\rm orient}$ to be at maximum in the orientationally disordered state, the isotropic, and for it to be at minimum in the completely ordered states, like the smectic.

On the other hand, mathematically we require that:

  1. $S$ to be continuous at every $\Omega$
  2. To be extensive with system size
  3. Differentiable (does it have to be always?)
  4. Path independent: state function
  5. …what else?

Clearly if we don't know how to bound such functions based on a set of axioms, we cannot know if they will make sense physically and mathematically.

(Helpful thought scenario: Imagine studying a system of particles in suspension, where their positional distribution comes with a certain periodicity, if we are to attribute an entropic term to the state of periodicity of the system, what conditions should such state function satisfy.)

  • What are the axioms to be satisfied by a state function for it to qualify as entropy?

Best Answer

Ultimate physical motivation

Strictly in the sense of physics, the entropy is less free than it might seem. It always has to provide a measure of energy released from a system not graspable by macroscopic parameters. I.e. it has to be subject to the relation $${\rm d}U = {\rm d}E_\text{macro} + T {\rm d} S$$ It has to carry all the forms of energy that cannot be expressible macroscopically, which we summarize as "heat" but the actual physics behind this "heat" might be quite different from the notions in gases etc. If entropy does not satisfy this relation, it is not a physical entropy. This would be a full characterization of entropy for macrophysics. I am going to use only this definition, not the cases where entropy is a handle to talk about information.


Statistical formulation

This constraint indeed does provide some freedom for the statistical definition of entropy, but not in effect. The freedom is basically in the fact that we are doing the $N\to \infty$ and $V \to \infty$ limits and a lot of information from the definition gets smeared out. We can for example define the phase space volume of the microcanonical ensemble in three distinct ways. First one is $$\Omega_\text{sharp} = \int_{\sum E = U} d \mu$$ Where $\mu$ is some kind of measure over the space of states. Or we can put $$\Omega_\text{non-sharp} = \int_{\sum E \in (U-\varepsilon,U)} d \mu$$ or even $$\Omega_\text{nobody cares} = \int_{\sum E < U} d \mu$$ Any of these will work for $S = k_B \log \Omega$ in the mentioned limits (the limit will give the same $S$). But this is more of a relict of the large limits - the physically plausible option is $\Omega_\text{sharp}$.

The much more important issue is counting the number of relevant states, the transition from discrete states to continuous ones and why we should consider them "democratic". This would be a very long argument involving ergodicity and so on.

For ergodic Hamiltonian systems, the probability measure is certainly proportional to $d^n x d^np$ where $n$ is the number of degrees of freedom. From quantum mechanics we know, that the "democracy" factor of discrete to continuous states makes this measure $d^n x d^np/h$ with $h$ the Planck constant. (Only the relative weights matter, since we normalize anyways.)

The conclusion is that the procedures of statistical physics, for a given system, can give us entropy unambiguously (up to an additive constant representing the freedom of state normalization).


Hand waivy conclusion

So there always is one entropy for every situation and we know how to derive it. The trick is only to specify which degrees are "free" or getting randomized in a complicated interaction and turn on the statistics.

But there are some loopholes. We see that the justification of the whole procedure (the "democratization" of states) relies on the Hamiltonian formulation and basically also quantization. But we know quantization is more of an art than a science and the statistical procedure can run into very similar problems as quantization. Are we always sure what the macroscopic parameters of a system are? How do we describe the situation when we observe the microstate directly? What would be the entropy of a relativistic space-time? Which would be the "activated" degrees of freedom? Etc. But this is a question for the "art of physics".


Additional note: "Art of physics" - modelling and confirming

A brief comment on "the art of physics". As with any physical models and approximations, there are three criteria:

  1. Foundation on (more) elementary physics
  2. Self-consistence of result with assumption
  3. Empirical verification

Say we have an open system $\Xi$ with a channel of particle inflow. However, we only know how to compute the parameters relevant for the inflow for small number densities in $\Xi$, because then we can use a one-particle model of entrance and leaving from the system. The one-particle model would be the point 1. - foundation on physics believed to be fundamental. We thus presume low number density and compute the statistics of the system.

But this is where the theorist's work should not stop, the last step is to check whether the density is sufficiently low under any choice of parameters and identify these regions in parameter space - this is point 2. However, this is a very primitive conception. For a serious model, the theorist should at least check whether two and higher particle models of inflow cannot suddenly take over even at low densities and investigate under what conditions they do not. This is 1. mixing with 2.

Nevertheless, there is also 3. - the empirical verification. It would be very naïve to pretend that the theorist is able to anticipate all the possible effects. In fact, Einstein's papers are well known to just shoot out a model without long mathematical discussions of neglected effects, and give experimental predictions right away. Sometimes, intuition rules (sometimes it also does not).

In the case of entropy this would be achieved by measuring the heat response of the system. It's not only heat capacities in the form $$C_{...} \sim \left(\frac{\partial S}{\partial T}\right)_{| ...\;=\text{const}.} $$ but also a lot of other response coefficient involving temperature as specified e.g. by the Maxwell relations.

So the answer would be: If a well developed model predicting quantitatively the entropy exists and it is confirmed by thorough testing, the entropy qualifies as the unique entropy of the system.


Additional note: Observed mathematical conditions

Let's say our the physical motivation is paramount. Then the strongest we can say is the following:

  • Entropy is a single-valued function of the full set of macroscopic parameters. (I.e. if it is not it might also be because the list of parameters is not complete.)
  • Entropy has a finite difference between any two points in the macro parameter space. I.e. $|\Delta S|<\infty$.
  • Entropy is homogeneous in the parameters defined by physical criteria as "extensive". I.e. for a complete set of extensive parameter $A_i$ we have $S(\lambda A_1, ...,\lambda A_n, ...) = \lambda S(A_1,...,A_n,...), \forall \lambda <0$.

In phase transitions as common as freezing/melting entropy is even discontinuous thus the criterion. (But this happens only in the $N \to \infty$ limit as discussed e.g. by Kardar in his notes.) Physically we are able to measure only $\Delta S$ so a strict requirement of well defined $dS$ is both redundant and impossible for some very common systems.

It is important that the "extensivity" is just saying "take another copy of the system" - the parameters which double by this operation are extensive but so is also heat stored in the new "double" system. Taking all the extensive parameters and multiplying by $\lambda$ just means "taking $\lambda$ copies of the system". This all relies heavily on the fact that we are able to very clearly identify the physical operation of "taking another copy of the system".

There are cases such as Black hole thermodynamics where such a notion fails. In a way, the whole space-time is the thermodynamical system, so "take another copy of the system" is hard to specify. (More technically, the formulas are for isolated black holes and there is no way to screen out gravity otherwise than by infinite distance.) It might seem that the horizon surface $A$ would be an extensive parameter but it actually grows as $\sim M^2$ - we cannot just say "double the mass" because that would not work.

Related Question