Statistical Mechanics – How to Understand the Equipartition Theorem in Momentum Space

field-theoryspecific-referencestatistical mechanicsturbulence

Motivated by the answers to this question on turbulence, I'm interested in an explanation and/or derivation/reference of the equipartition theorem in momentum space. To formulate it as a question:

If one considers a physical configuration, which admits a description in a dual/momentum space in the framework of statistical mechanics (basically like fields do), what is the realization of the equipartition theorem in these $k$-space terms.

I.e. how does it affect the distribution of energy and what consequences do different dispersion relations have on the evolution, or even outcome of that partition? How do we understand statistical mechanics, not in the inital degrees of freedom, which are usually well motivated by the model building process, but in abstract $k$-space terms and what do we typically expect?

Best Answer

General Mumbo-Jumbo about Statistics

When you have any Hamiltonian mechanical system, with degrees of freedom $q_i$, conjugate variables $p_i$, and Hamiltonian $H(q_i,p_i)$ there is a conserved phase space volume, which is just the area in q,p space, defined by the volume element

$$\prod_i dp_i dq_i$$ The conservation of phase space volume is Liouville's theorem, and it is easy to prove directly.

If you add the plausible but generally next-to-impossible to prove assumption of ergodicity, which says that for large enough systems with generic interactions, there are no special surfaces on which the motion is confined, that any motion is as likely as any other. This is equivalent to the absence of any other conserved quantity which is defined by a nice analytic surface, that all the other conservation laws (which necessarily exist, because every point on the trajectory is determined by the initial conditions, so the initial condition values are the other conserved quantities) give intrinsically complicated mixed up surfaces which become ever more mixed up in the infinite degree of freedom limit, so points with different values of these phony conserved quantities cannot be meaningfully separated from other points with different values. If this is so, the only real conservation law is the conservation of energy, and this gives you that the correct invariant probability distribution on the phase space is the uniform distribution on the surface H=E (although you must be careful that the measure at any point on the energy surface is defined by the volume in the full phase space between two infinitesimally separated surfaces of energy E and E+dE), and this distribution describes the statistics of nearly every trajectory.

This uniform probability distribution on the constant energy surface is the microcanonical ensemble. The log of the volume of the microcanonical ensemble is the entropy S(E). The assumption of ergodicity says that any point in phase space p,q is as likely as any other point, except to the degree that the energy varies in the direction perpendicular to the constant energy surface, the reciprocal of this rate of change is the local microcanonical density.

For a subsystem of a large system, dividing the big system into parts 1 and 2, where 1 is big but small compared to 2, you know that the whole thing is described by the microcanonical ensemble, so that the sub-part 1 is desribed by the microcanonical ensemble with energy $e$, and the other part by the microcanonical ensemble $E-e$. Statistically speaking, you expect that if system 1 is big, the average energy e fluctuates only by a small amount from its average value. The total phase space volume for two independent systems is the product of the volume of each one, so the entropy (which is the log of the volume) is the sum of the entropies:

$$ S(e) + S(E-e) = S(e) + S(E_2 - e) \approx S(e) - {\partial S_2\over \partial E} e = S(e) - \beta e $$

So that the probability, which is found by exponentiating the entropy, is weighted by a factor of $e^{-\beta e} $$. The assumption here is that the system 2 is a large bath, so that the derivative of entropy with respect to energy is almost exactly constant over all points of the microcanonical ensemble.

This tells you that any sub-part of a large system has a distribution on phase space which is given by the canonical ensemble--- every state has a probability at any point of phase space equal to

$$ P(E) = e^{-\beta E} $$

where $\beta = {1\over T}$ defines the absolute thermodynamic temperature (in temperature units where Boltzmann's constant is equal to 1).

The exponential suppression of large energies is easy to understand statistically. There is a conserved quantity, the energy, which is global. In order to absorb a unit of energy, you have to pay a probability cost which is uniform, but otherwise, there is no constraint on the motion. This is the maximum-entropy interpretation of the Boltzmann state--- each subsystem pays a probability cost for absorbing a unit of energy, and this probability cost is adjusted by making sure the total energy is whatever it is. The maximum entropy distribution for any conserved quantity is found by imposing a probability cost for absorbing each unit of the conserved quantity, so that each subsystem is only probabilistically constrained by the amount of each conserved quantity that it has. The log of the probability cost for the quantity is the thermodynamically conjugage variable (or, rather, it would be, if thermodynamics developed logically rather than historically. In actual life, people multiply all the thermodynamic conjugate quantities by the temperature for no good reason, to turn the more fundamental extra log-probability quantity into a less fundamental quantity which is the extra free energy per unit conserved quantity, so that the thermodynamically conjugate quantity to U has units of energy per unit U, rather than (dimensionless, quantumly additively unambiguous) entropy per unit U. I try not to use this otherwise universal convention, because I think it is wrongheaded. Also, it is good use $\beta$ instead of T most of the time, since $\beta$ is the thermodynamically conjugate variable to E.)

The thermodynamic formalism is not just a solution to the problem of deriving thermodynamics--- it also gives a solution to the generally unsolvable problem of the statistics of a generic motion in a mechanical system. If you ask "what does a typical motion look like for a subsystem of a big system without any conserved quantities other than the energy?" It can only look like a canonical ensemble for the individual subsystems that make up the system. If it doesn't look like this yet, it must be in a special state, and this state will be unstable to spreading out in phase space. When the state is fully randomized, the statistics will be those of the canonical ensemble with a temperature determined by matching the energy at temperature T to the total energy.

This is remarkable, because the general problem of describing a deterministic thing statistically is impossibly mathematically difficult in any rigorous way. If you take a Rubik's cube, and shuffle it by a sequence of moves of the form "turn the front face clockwise, then turn the cube in a direction determined by the next digit of pi modulo 4, and repeat", you will not be able to prove anything about the state you get. But it is obvious that the probability distribution of the colors you see will always eventually be indistinguishable from the uniform distribution on all Rubik's cube configurations, even though you can't prove anything of the sort with any rigor.

For another simpler example, if you take a long string of binary digits which ends with "1", shift it one position to the left by concatenating a 1 at the rightmost position, and binary add the two strings, and shift the result of the addition to the right until you get rid of all the zeros on the right, you have a deterministic procedure on bit strings which you can iterate. It is clear by doing it a few times that you always quickly get a randomized pattern of bits, where any pattern of 1s and 0s is equally likely in any small window.

This process is well known in mathematics--- it is the 3n+1 procedure, the Collatz problem. It is a simple consequence of eventual randomization that the 3n+1 Collatz conjecture is true, that all finite patterns reach "1" eventually (because all infinite random bit strings are shifted to the right after a many iterations with probability 1). But to prove this conjecture rigorously is well beyond current mathematical methods. So proving that a deterministic system turns random in any meaningful way is generally extremely difficult. Even so, seeing that it turns random is generally not difficult-- you can identify the stochasticity by eye and by simple statistical tests. Further, it is often not difficult to identify what the correct probability distribution should be, once it turns random, just by identifying the conserved quantities in the problem, and making a distribution function from these conserved quantities which is preserved under time evolution. This is the source of many conjectures.

Boltzmann solved this problem of statistics of the invariant distribution for mechanical systems in equilbrium, the general solution to the problem of the statistics of the deterministic motion on a subsystem, starting from any initial conditions, and for long enough times, is given by the Boltzmann distribution defining the canonical ensemble.

So nontrivial statistical problems in deterministic systems are physics, not rigorous mathematics, barring a major breakthrough in mathematical methods. The reason is that the physicist does not worry about justifying randomization in a rigorous sense, only in a scientific sense--- if a system looks random and passes the appropriate statistical tests, it is scientifically random (although, of course, mathematically you haven't proved anything). You can think of this as an ad-hoc axiom schema about the statistical properties of various deterministic automata, although if improperly formulated, some of these axioms will be false, and I don't think that very many of these axioms are likely to be independent of powerful enough set-theoretic axioms, and all of them should probably be resolved by large enough cardinals or finitistic analogs of large cardinals. So they are not really new axioms per-se, just an infinite list of 3n+1 Collatz style conjectures, obviously true yet incredibly difficult to prove.

The upshot is that, when you have your physicist hat on, you should take any such randomization result for granted. In particular, you take the ergodicity hypothesis for granted, when you can't find conserved quantities, and numerical integration shows that they are not present.

Equipartition

Once you understand the Boltzmann distribution, equipartition is very easy. Consider a mechanical system consisting of n oscillators, with Hamiltonian

$$H = A_{ij} p_i p_j + B_{ij} q_i q_j $$

Where A and B are two positive definite matrices, only the symmetric part of which is important, so take them symmetric. The canonical ensemble distribution for the states is, after a diagonalization rotation of p and q, a product of Gaussians:

$$ P(p,q) = \prod_i e^{- \beta( A_i p_i^2 + B_i q_i^2)} $$

And a direct calculation shows that the expected value of any $A_i p_i^2$ or $ B_i q_i^2 $ is just ${1\over 2\beta}$. This is the equipartition theorem. Every mechanical oscillator in thermal equilibrium has $T/2$ kinetic and $T/2$ potential energy, and in general, every quadratic term.

Note that this result does not depend on the stiffness of the oscillator. If the oscillator is stiffer, so that $B_i$ is big, the oscillations are smaller in amplitude, but still carry the same energy. So if you have fast oscillators and slow ones, classically, they only are in equilibrium when they are at the same temperature, so that they are oscillating with the same energy.

For higher order terms, if the potential goes as $q^4$ or $q^8$, the result is different, but generally, the kinetic energy is always quadratic in the momentum. In the limit of $|q|^\infty$, the box potential, there is 0 potential energy but still T/2 kinetic energy. So that the partition into a given mode is generically between $T/2$ and $T$, and always of the order of T.

This result applies for any subsystem with a separated energy, so that the interaction between the subsystem and the rest of the system are small compared to the internal interactions, and the conditions of thermal equilibrium apply.

If certain oscillators are damped, these oscillators lose energy, and become cold. Then energy flows in from oscillators which are not damped, according to the internal thermal gradient. The description can sometimes be by non-equilibrium thermodynamics, where you assume that different parts of the system are locally in equilibrium, but with a temperature which is different from part to part. For fluids, the flow at the smallest scales is always cold, because it is damped by viscosity, while you are stirring the largest scales, so these are always hot. The flow of energy from cold to hot unfortunately is not well described by non-equilibrium thermodynamics, because the assumption that each k region is in a local thermodynamic equilibrium is not true.

Nevertheless, this point of view is a useful first approximation to the turbulence problem.

Ultraviolet Catastrophe

For fields, the equipartition of energy leads to the famous ultraviolet catastrophe. To see this, you formulate the field statistical problem in k-space. I'll use a simple field where the Fourier description is obviously by coupled oscillators (this is not so simple for the Navier stokes equations, because any physical oscillation frequency depends on the nonlinearity crucially). I'll use quartic field theory, with Hamiltonian

$$ H = \sum_i {1\over 2} (|\Pi_i|^2 + |\nabla \phi_i|^2) + V(\phi) $$

Where $\Pi_i = \dot{\phi_i}$ is the conjugate momentum to the field $\phi$, derived from the usual relativistic Lagrangian.

$$ V(\phi) = \sum_{i,j,k,l} \lambda_{ijkl} \phi_i \phi_j \phi_k \phi_l $$

Where the $\lambda$'s are chosen so that the potential is non-negative in every direction in field space. The quartic interaction will make the field system non-integrable in general, so that the field modes will all be coupled in a nonlinear way, which should lead to ergodic mixing in phase space, with an approach to equilibrium.

The gives a Boltzmann distribution on phase space $\phi,\Pi$,

$$P(\phi,\Pi) = e^{-\beta (\sum_i {1\over 2} |\Pi_i|^2 + {1\over 2} |\nabla \phi_i|^2 + V(\phi) )} $$

Which, ignoring the nonlinearity, is just a bunch of oscillators. In terms of the Fourier variables $\Pi(k),\phi(k)$, it is

$$ P(\phi,Pi) = e^{ - \beta (\sum_{ik} {1\over 2} |\Pi_i(k)|^2 - {1\over 2} k^2|\phi_i(k)|^2 - V(\phi)} $$

Which is quadratic in the momentum and the position, ignoring the potential V, and so gives the equipartition for field modes--- each k mode has energy ${1\over \beta}$ in thermal equilibrium, split evenly between potential and kinetic energy. The thermal equilibrium state at any temperature consists of fluctuating fields with a divergent amount of energy, which if you cut off k at wavelength $\Lambda$, will diverge as $\Lambda^4$. This is just like the vacuum energy problem in quantum fields, except here it is physical--- a classical field cannot reach thermal equilibrium. If you have total initial energy E and cutoff $\Lambda$, you will equilibrate at a temperature which is something like $T=E/Lambda^4$, which goes to zero as the cutoff goes away. The system just dumps all the energy into progressively smaller wavelengths, dividing it into smaller and smaller parcels in the process.

I ignored the quartic term completely in the above analysis, but it is easy to see that the quartic nonlinearity can't affect the result too much. At worst, it can drive the equilibrium potential energy in a given mode to 0, leaving only $1\over 2\beta$ kinetic energy in the mode. It doesn't affect the kinetic energy at all.

Really, the quartic mode doesn't have to affect the potential energy very much, because you can tune $\lambda$ close to 0, in which case it will slow down the mixing time between different modes, but it will still lead to thermalization over time, which still sucks the energy out of any large wavelength modes into the smallest wavelength modes, in accordance with the ultraviolet catastrophe expectations.

This observation, that classical fields suck all the energy into the tiniest wavelength modes, is due to Einstein, Rayleigh, and Jeans. There is some history of science literature on the proper attribution of this result, due to Thomas Kuhn, which says that the result is new in 1905, and did not motivate Planck. I am not sure if this analysis is correct (for a probably inaccurate discussion, see the talk page of the Wikipedia article on ultraviolet catastrophe--- warning, I didn't read the original literature, Kuhn did, and Kuhn says Planck didn't care about equipartition, but I think Planck and others knew about it anyway). The inability of fields to get to thermal equilibrium famously led to quantum mechanics, but it is also an important insight for the problem of classical turbulence.

Classical turbulence should be viewed as the attempt of a classical field to thermalize in those situations where it is constantly driven by an energy input, and where the energy leaks to smaller wavelengths in an irreversible way, because there are infinitely many modes down there. If the energy is constantly replenished, you end up with a steady state energy flow downward in k space. The k-space energy flow is somewhat analogous to the flow of heat along a thermally conducting material from a hot reservoir to a cold reservoir. It's not exactly analogous, because the local concept of temperature is more iffy.

Nonequilibrium thermodynamics of turbulent mixing

Suppose that you drive the field at low k with a stirring force, and suppose that the stirring force produces low-k modes in local thermal equilibrium, so that the low-k modes are populated according to the Boltzmann distribution for the low-k modes, but the high-k modes are not populated at all. The low-k modes try to thermalize higher k. The thermalization process depends on the nonlinear mixing, so you have to examine how that works.

In Fourier space (transforming the space but not the time), the nonlinear term is

$$ \sum_{k_1 k_2 k_3 k_4} \lambda_{ijkl} \phi_i(k_1) \phi_j (k_2) \phi_k(k_3) \phi_l(k_4) \delta(k_1 + k_2 + k_3 + k_4) $$

The delta function enforces translation invariance, and it is important, because it says that modes of size |k| can only interact with each other to make modes of size bounded by 3|k|, at worst, and typically only of size 2k. This means that the flow of energy is sort-of local in k-space, because the mixing nonlinearity can't push the energy in one step from small |k| to very large |k|, it can only add a factor of $log(2)$ to the log of the size of k. This is obviously true for any polynomial term in a nonlinear equation.

If you assume that the nonlinear coefficient $\lambda$ is small enough, then the modes interact perturbatively over many linear oscillation cycles, so that the condition for resonant interaction is that the energy is conserved as well. This is best expressed by using an additional delta-function in the "energy" (quantum mechanically, this is the conservation of energy--- this isn't quantum, so it is just a resonant frequency matching condition)

$$ \delta(\epsilon(|k_1|) + \epsilon(|k_2|) + \epsilon(|k_3|) + \epsilon(|k_4|)) $$

where for the case of the previous equation $\epsilon(|k|) = |k|$. This condition is not so restrictive, it doesn't prevent a long-distance cascade in k-space. But if you change the equation to break Lorentz invariance (but not rotational invariance), and make $\epsilon(|k|) = k^{2N} $$ where N is large, you can find a limit where the description of the turbulence is precisely by a local flow of energy in k space.

The reason is that the constraint of energy conservation, or resonant mode interaction, for large N requires that the length of the sum vector must be equal to the length of the longest vector of $|k_1|,|k_2|,|k_3|$, up to a small correction which goes as $1/N$ times the 2N-th power of the ratio of the second-longest k to the longest k. This means that the dynamics in k-space becomes entirely local, with k's only sourcing neighboring k's because far-away k's are not at all resonant, their frequency being completely different.

The resulting cascade for the dispersion relation $\epsilon(|k|) = k^{2N}$ with a weak quartic coupling is therefore be described by a local thermal equilibrium with a temperature that depends only on |k|, at weak coupling. This observation seems interesting, but I am not sure if it is new. In this model, it is straightforward to calculate all properties of the turbulent cascade from a thermal gradient on the k-space.