[Physics] Entropy as an arrow of time

arrow-of-timeentropysoft-questionstatistical mechanicsthermodynamics

From what I understand, entropy is a concept defined by the experimentalist due to his ignorance of the exact microstate of a system. To say the number of accessible microstates $W$ of the universe is constantly increasing is nothing more than saying 'ignorance begets ignorance'.

I have often encountered the argument of ever increasing entropy for the presence of an inherent time asymmetry, most prominently in the works of Penrose. It just doesn't seem to make sense.

Let us imagine alien beings who are experiencing time in reverse. For them, increasing ignorance is in the direction of our decreasing ignorance. So how can perpetual increase in entropy indicate an 'arrow of time'?

One possible explanation (I thought of) for this argument was the fact that there may never be a mechanism to reduce ignorance. I will pose this as a question:

An observer determines the number of possible microstates of system+observer to be $W_0$ at time $t_0$. After improving his measurements, can he (at a later time) measure $W'$ (where $W_0>W'$) as the number of possible microstates? Assume that all the microstates are equally probable in this case.

Unless the above in false in general, how else can anyone claim that entropy reveals an arrow of time?

EDIT: I am performing this discussion for isolated systems (the universe or it's parts if relevant). The means I propose to reduce entropy in the present, is by redefinition of macroscopic variables and of microscopic models used to count the number of distinguishable micro-states (that obviously yield the same macro-state). This is essentially an argument against the robustness of entropy via redefinition.

Best Answer

I have been thinking about your question for a quite a few days, I can't pretend that I fully grasp exactly what you're driving at. So I'll write down my thoughts, hopefully you can clear up any misconceptions I have and thus we can work together towards a worthwhile answer for you. I'll just give the thoughts I have in answer to each idea in your text.

From what I understand, entropy is a concept defined by the experimentalist due to his ignorance of the exact microstate of a system.

I agree with this one. One can summarise by saying that the entropy informally can be thought of as the length of the smallest book one would need to write to "correct this ignorance" by defining the system state exactly once the macrostate is known.

To say the number of accessible microstates $W$ of the universe is constantly increasing is nothing more than saying 'ignorance begets ignorance'.

I don't fully agree with this. I'm going to plead my own ignorance for the whole universe, and look at everyday systems. Even so, there is a sense wherein your statement is partly true, and that is from a certain subjectivist view of probability and statistics (bear with me, I'm going to try to discuss this as fully as I can).

A "Random Walk" Through Phase Space

The simplest and best way to think of the law that "Entropy is always rising" is that in Chapter 27 "The Big Bang And Its Thermodynamic Legacy" of Roger Penrose's "Road To Reality". The following is a little different from his explanation, but I will bring the two explanations together in a few paragraphs' time.

I like to think of all this from the standpoint of the statistical law of large numbers, or, as I like to call it in the context of thermodynamics, the law of "Very Pointy Probability Distributions"! We begin by thinking of the simple binomial distribution and a sampling experiment wherein we draw, say, red and green balls from an infinite population for which, say, the proportion of green ones is 0.4. As our sample gets bigger and bigger, the distribution of the sample proportion of green balls gets pointier and pointier around 0.4. The likely fractional error in assuming the sample proportion were 0.4 gets smaller and smaller: at "thermodynamic" sample sizes it is utterly negligible, even though the absolute number of green balls in the sample varies a great deal and the likelihood of sampling exactly 0.4 is unbelievably small. Another way to see this is as follows: as the sample size rises, the overwhelming majority of sample "microstates" are very near the maximum entropy one wherein the proportion is exactly 0.4. Or, the overwhelming majority of sample "microstates" are exactly the same, for all practical purposes as the maximum entropy one. There are, of course, possible arrangements where there is only one, none, or a few green balls, but the chances of drawing them become negligible. The "pointiness" comes from applying Stirling's approximation to the binomial distribution: the mean of the distribution $x_0 = 0.4$ arises at the maximum entropy sample where $\partial_x p(x)|_{x = x_0} = 0$ (here $x$ is the actual sample proportion and $p(x)$ its probability distribution) and, by the very nature of Stirling's approximation, only the maximum entropy probability $p(x_0)$ and its second derivative (i.e. $\partial_x^2 p(x)|_{x=x_0}$, equal to the reciprocal of the variance) are important for defining the probability distribution for all practical purposes for very large samples. The same pointy behavior arises for all of the canonical ensembles. You may have more constraints, such as total number of particles being constant, the total energy being constant (for the microcanonical ensemble), and so forth. But the Stirling's approximation works exactly the same way: the probability distribution of actual arrangements as a joint distribution defined on phase space becomes a (highly) multivariate Gaussian distribution that gets pointier and pointier with rising sample size and the vast majority of arrangements end up looking just like the maximum entropy one even with the problem constraints: in this case this maximum entropy arrangement is found by maximizing the number of arrangements consistent with a given macrostate subject to the number, energy, and so forth constraints the problem has, each constraint merely adding a new Lagrange multiplier. Such multipliers do not change the essential pointy nature of the solution as the sample size increases.

So if the system undergoes a "random walk" (this word holds fearsome subtleties which I shall speak more about) in phase space, wherever the walk may begin, it is almost certain to swiftly reach a microstate that looks very like the maximum entropy one. The idea that the system "seeks out its highest entropy state" or any other like ideas often given with the derivation of the canonical ensembles is of course preposterous! The system is mindless - it doesn't seek! It can't even spell "ENTROPY"! We just wind up near a maximum entropy state by a random walk because the canonical ensembles are so, well, canonical! They describe the overwhelming majority of arrangements well. For "thermodynamic" sample sizes, there are only arrangements very like the maximum entropy ones and there is almost nothing else.

So then, given the assumption that we begin with the system in one of those fantastically seldom states well removed from the maximum entropy one, i.e. a microstate belonging to the "almost nothing else" above, the overwhelming probability is that the system's entropy must rise, simply by taking a "random walk" through phase space. This is, as far as I can understand, the mainstream explanation resolving the Loschmidt paradox (see references [1] and [2] ) which is the name given to the paradoxical observation that entropy is increasing even though physical laws are just as valid with time running backwards. Namely, the answer has to do with the "boundary conditions" of the universe: the universe was (observed fact) in an exquisitely low entropy state at the big bang, and so the overwhelmingly likeliest history is one where entropy rises with increasing time just by dint of the "random walk" argument I just made. But how and why that low entropy state arose is, as I understand it, one of the profound mysteries of modern physics.

Entropy As Information

Let's now look at this idea of "random walk" more keenly. There are two extremes wontedly thought about in thermodynamics: closed systems and those in contact with outside "reservoirs" - bodies at thermodynamic equilibrium which are so big that no amount of heat transferred between our system and its outside reservoirs changes the macroscopic state of the latter significantly for the purposes of analysis of the system at hand.

Naïvely, one might think that as the system evolves, one's ignorance of its innards would indeed increase, so that the length of our microstate defining book above steadily grows with time. However, there are only certain senses wherein this is true. Let's first talk about a closed bottle of ideal gas partitioned into halves by a door such that all the gas, in thermodynamic equilibrium at temperature $T$, is held in one bottle half whilst the other half is empty. At time $t=0$ we open the door: let's idealise the situation and assume the door simply vanishes, so the gas suddenly fills the whole bottle (together with a loud "THUNK"!). We'll think of three cases:

The bottle is a truly closed system so that NO heat passes through its walls but also the walls behave as though they were at absolute zero temperature (the walls need to have zero conductivity for this), and moreover its insides are perfectly smooth and it has a simple, readily described geometry;
As in case 1, but now the bottle's inside surface is a realistic, jagged shape set by the imperfect arrangements of the molecules making up the bottle. The molecules are stil at effectively absolute zero temperature and there is zero conductivity: no energy passes between these molecules and the gas molecules and the latter bounce elastically off the former;
The bottle walls are an ideal reservoir made of thermalized molecules, also at the same temperature $T$ as those of the gas inside.

For another take on like problems, see my answer to [4] below, but for now, in talking about these three cases, it's useful to talk about two different concepts of entropy, as defined in Edwin Jaynes's paper reference [5] at the end of my answer. These are the Gibbs and Boltzmann entropies, or, as I like to call them, the (1) Informational / Shannon entropy and the (2) Experimental entropy, respectively (no disrespect to Gibbs / Boltzmann meant). The experimental / Boltzmann entropy is the Shannon entropy calculated from the marginal state assignment probability distributions for each molecule and then multiplying by the number of molecules, i.e. calculating as though there were no correlation between molecules, whilst the informational entropy is the Shannon entropy calculated for the joint probability distribution of the states of the whole system of molecules at once.

It can be shown that in many systems, the experimental entropy above is the same entropy we would get by applying Clausius's objective definition ($\mathrm{d}S = \mathrm{d}Q / T$), by making macroscopic measurements either directly on a system (such as by measuring volume, temperature and pressure of an ideal gas) or on a system throughout a certain controlled state history, as in measuring the heat taken up as a function of temperature for system as its temperature rises from near absolute zero temperature to e.g. 300K as is conceptually done to get a molar entropy of formation for a substance. Only sometimes is the experimental entropy equal to the informational entropy - e.g. for an ideal gas, they are equal if and only if the states of the gas's constituent molecules are statistically uncorrelated - this is the equivalent of the Boltzmann "Stosszahlansatz" (molecular chaos assumption, although his own word means "crash number hypothesis") and the defect in Boltzmann's reasoning that leads to the Loschmidt paradox (see the references [1] and [2] below) can be summarised in Jaynes's words that once a system has left a perfectly uncorrelated state, one has to explain how correlations are destroyed before the Stosszahlansatz can be brought to bear again. Boltzmann's H-theorem (see "Boltzmann's H Theorem" section of the Wikipedia "H Theorem" page), for example, assumes the Stosszahlansatz can always be applied, but in reality if the system reaches a microstate where it truly applies, any further collisions correlate the states of the molecules involved! The Boltzmann H theorem derivations thus strictly fail because do not tell us how these correlations are destroyed. Even so, the H-theorem is a useful idea, especially from the subjectivist interpretation I'll talk about below.

The difference between the experimental and Informational entropy is called the Mutual Information in information theory and is computed from the statistical correlations between the molecular states.

So let's look at the first alternative above. If the volume of $N$ molecules of ideal gas irreversibly doubles at a steady temperature $T$ (i.e. the pressure in the bottle halves), then the experimental (Boltzmann) entropy rises by $N\,k_B\,\log 2$. Naïvely, one might equate this (modulo the Boltzmann constant) to the rise in informational entropy too; after all, each molecule now needs one further bit of information to describe which half of the bottle it is in. However, this is not so. At a microscopic level, the fundamental laws of physics are reversible, so that one can in principle compute any former state of a system from the full knowledge of any future state and contrawise - no information gets lost. So if we knew the exact velocities and positions of the molecules before the door opened, we can compute them at any time afterwards. Actually, the gas's informational entropy rises a teeny-tiny bit because one must also know a full description of the bottle's inside geometry. However, it is smooth and simple, so this new information is utterly negligible compared with the information content of the gas states. Now, the Boltzmann Molecular Chaos (Stosszahlansatz) no longer applies. The states of the gas molecules are statistically correlated, the mutual information arising from these correlations accounts for the considerable difference ($N\,k_B\,\log 2$) between the informational and experimental entropies and, given the highly idealized nature of the bottle walls, there is no way for these statistical correlations to be destroyed (again, see my description of the weirdly idealized gas in my answer to [4] below).

Now let's look at the second alternative. The explanation of the two entropies is pretty much the same as above, aside from that now the bottle is highly complicated. One needs a great deal of information to specify the unknown jagged microstructure of the walls, which will certainly influence the future gas molecular paths. The informational entropy of the gas alone is going to rise significantly by dent of the "gas's probing more of the outside universe", namely, the new surfaces in the bottle, whose full description is not included in the gas entropy. So the correlations between molecular states I spoke of before are still there but less and thus so is the difference between the experimental and informational entropies. Indeed, if the bottle surface is jagged and complicated enough, the correlations might be destroyed altogether and the Stosszahlansatz restored. So we have a curious situation where no heat passes through the bottle walls, but the original system is still not closed by dent of new parts of the universe (i.e. the formerly empty half-bottle walls) weighing on the molecular states.

Now to the last alternative. Everything happens as above, but now the gas is in contact with thermalized walls. Therefore the gas-wall molecular interactions randomize the gas molecular states, the correlations between molecular states are swiftly destroyed and, as far as the gas system is concerned, the Stosszahlansatz is restored. If you like, information can pass from the reservoir from outside to inside the gas system and this information must be accounted for to fully define the microstate.

So you can see in many practical situations, both informational and experimental entropies only differ wildly for short times and tend to increase as the system under consideration couples, either directly or indirectly, with more and more of the outside universe.

A Random Walk Through Correlated Phase Space

So now let's go back to our random walk argument. The random walk argument works if the initial system experimental entropy is far enough below the maximum entropy and if all arrangements and microstates in question are equally probable. This is sometimes called the ergodic hypothesis. What happens when we get weird correlations as I have described though? They mean that the ergodic hypothesis is no longer true: some microstates are now more likely than others. But it seems plausible that, over large enough scales in phase space, the differing probabilities for neighbouring microstates "average out" so that whilst a large enough chunk of phase space contains microstates with widely diverse probabilities owing to the correlations, the larger chunks of phase space are still equiprobable phase space volumes. It seems highly plausible that the random walk argument I gave is highly robust in this way: as long as there is some not too large a scale in phase space that we can "coarse grain" the phase space by without disturbing the macrostate, then the "random walk" will still mean what we intuitively think it does and the system will still tend to wander into the overwhelmingly probable maximum entropy-like microstates.

This highly plausible assumption ultimately needs to be experimentally verified. The fact that the second law of thermodynamics is experimentally true lends lends experimental weight to the above coarse graining argument.

Another way to look at this is that the informational entropy defines the volume of the set in phase space which, by dent of a macroscopic measurement, we can say our system must lie within. Given the Stosszahlansatz, we can hypothesise a "normal" looking simply connected (or one with not too wild a fundamental group), maybe even convex set in phase space. By Liouville's theorem (see [5] ) the volume of this does not change - this is another way of saying that the informational entropy does not change. But there is nothing to say that the set doesn't become "foamy" or fractal-like. Imagine an expanded polyureathane foam football as it is being made. Initially, the unexpanded polyurethane has a small volume (anology of the informational entropy). The volume of polyurethane does not change as the football is made in its mould (as the informational entropy is constant), but when you coarse grain the space around it a bit it seems to have taken on a vastly bigger volume (the Experimental entropy). We are reminded here of the the topological notion of denseness: the rational numbers have measure nought yet they are dense in the real line: you can't scoop up any teeny tiny interval and not have rationals within it. Likewise, even though the volume in phase space of the set defining the macrostate doesn't strictly change in volume (informational entropy), it becomes effectively dense (especially when coarse grained appropriately) in something of much bigger volume (experimental entropy).

A last but important reason the experimental entropy is important is the maximum entropy principle or the Gibbs algorithm. In the subjectivist framework of thought for interpreting probabilities described in the opening pages (up to and including section 2, especially this section 2) of Jaynes's classic work in the reference [2], the use of the experimental entropy is justified because practically there is no experimental way we can access the subtle and complex correlations spoken of above. Therefore, the experimental entropy is the only unbiased entropy we can assign; to do anything else would be to assert that we can reduce uncertainty about the system's state with further information that we simply do not have. This is the principle of the Gibbs Algorithm (see the Wikipedia page with this name), but Jaynes was the first to justify as a replacement for Laplace's principle of insufficient reason on the grounds that Shannon's entropy is the unique quantity which can be shown to uniquely fulfill a simple, compelling set of axioms defining what we would reasonably think of as a definition of uncertainty. So, we assume whatever hypothesis both leaves residual uncertainty maximized and is consistent with whatever knowledge we have of a system.

If we wanted to do more and more finer and finer experiments to try to discover the correlations described in our thought experiment, we'd end up contradicting the point of doing thermodynamics anyway - we'd be working with something that is more and more like the system's microstate than the macrostate! This last statement is perhaps less true than in Jaynes's day, for, as I understand it, much of the research into thermodynamics nowadays IS done with very small, highly nonequilibrium systems wherein statistical fluctuations are highly important, the measurements on these systems are extremely detailed and the idea of the microstate seems much less inaccessible than it would have to Jaynes.

So ultimately, we have some theoretical hunches - things like the random walk argument and the coarse graining argument - together with experimental justification: observation of the big bang, and experimental verification of the second law of thermodynamics in our laboratories.

Ultimately the second law of thermodynamics is an experimental fact that cannot be rigorously proven by theory - the Loschmidt paradox and the Poincaré Recurrence Theorem are the ultimate undoing of any "rigorous" theoretical programme.

Further Questions

I have often encountered the argument of ever increasing entropy for the presence of an inherent time asymmetry, most prominently in the works of Penrose. It just doesn't seem to make sense.

The chapter in the "Road To Reality" I cited above very much for me explained the second law of thermodynamics as I have tried to argue above: i.e. that it is about the universe's boundary conditions. I have not read the later work of Penrose's ideas, and I understand he does now believe there is an inherent time asymmetry that explains the origin of the exquisitely low entropy state that was the big bang. I'll have to plead ignorance on this one too.

Let us imagine alien beings who are experiencing time in reverse. For them, increasing ignorance is in the direction of our decreasing ignorance. So how can perpetual increase in entropy indicate an 'arrow of time'?

I like the idea of some alien whom time runs backwards for, but what would such a thing mean? Presumably a creature with complex self awareness such that he or she feels complex thought states like emotions in response to both the world around them and also imagined worlds and feels pleasure in fulfilling his or her evolutionarily begotten needs, such as enjoying a scratch from a fellow alien to sooth an itch, lying in the sun of their respective star when they are cold or sitting around reading absurd words from little boxes wired together. From a physics standpoint, our alien is a fantastically complicated system, so I would seriously doubt one could make any serious progress with a thought experiment until one can characterize our alien more fully. I'm not meaning to sound flippant here: people used to speculate about the Maxwell Daemon as though it were a complex, conscious being and there the idea sat for many decades - no one able to further the idea grounded on something so complex - between the time Maxwell built his Daemon out of thoughts to give showing to the second law's statistical nature until people like Szilard, Landauer and Bennett understood that the Daemon could be built from very simple, finite state machines. Once the unneedful complexity had been stripped away, Landauer and Bennett in particular achieved profound insights into the nature of information and the thermodynamics of computation. In our universe, information cannot be disembodied, abstract strings, even though in information and probability theory it is often useful to think of it as such. In our universe information must be written in some kind of "Real Ink" and that ink is the states of physical systems. The computations done by and the information gleaned by the finite state machine Maxwell Daemon must end up encoded in the surrounding system once this information is forgotten by the Daemon, and this leads to an overall entropy balance, or increase, of the gas-Maxwell Daemon and surrounds system. See reference [7] below. Now, we even build and test real Maxwell Daemons in the laboratory and use them to study Landauer's principle and the rest of thermodynamics experimentally. See reference [8] below.

So, returning to your example, if you further probe what "for whom time running backwards for" means for, say, simpler, finite state machines, you might very well find that the idea is contradictory. Or maybe that, if workable, it would not lead to a contradiction because the alien were sundered from us by a spacelike separation. Or that, if the alien could visit us at some time in the far future, even though his or her part of the universe were highly different from our presently observable universe, our arrows of time might become aligned as the alien came near to us. Indeed I have seen speculations (they used to be on the Wikpedia "Entropy (Arrow of Time)" page but they have vanished) that the reason why we recall the past but not the future is that this is the direction of time wherein our minds are coupling with ever greater parts of the universe, as in our gas thought experiment above. This is all pure speculation of course, but it shows that you really need to simplify such ideas before furthering them.

One possible explanation (I thought of) for this argument was the fact that there may never be a mechanism to reduce ignorance. I will pose this as a question:

An observer determines the number of possible microstates of system+observer to be $W_0$ at time $t_0$. After improving his measurements, can he (at a later time) measure $W'$ (where $W_0>W'$) as the number of possible microstates? Assume that all the microstates are equally probable in this case.

Unless the above in false in general, how else can anyone claim that entropy reveals an arrow of time?

As I said, I think that one has to further analyse the idea of an alien whom time would run backwards for before being sure that such an idea were sound and that it provokes genuine contradictions or even paradoxes.

References

Relevant Physics SE Questions

Papers

And a remarkable experiment that actually BUILDS AND TESTS the Maxwell Daemon.

Shoichi Toyabe; Takahiro Sagawa; Masahito Ueda; Eiro Muneyuki; Masaki Sano (2010-09-29). "Information heat engine: converting information to energy by feedback control". Nature Physics 6 (12): 988–992. arXiv:1009.5287. Bibcode:2011NatPh...6..988T. doi:10.1038/nphys1821.

"We demonstrated that free energy is obtained by a feedback control using the information about the system; information is converted to free energy, as the first realization of Szilard-type Maxwell’s demon."

Best Answer

A "Random Walk" Through Phase Space

Entropy As Information

A Random Walk Through Correlated Phase Space

Further Questions

References

Relevant Physics SE Questions

Papers

Related Solutions

Thermodynamics – Why Does the Low Entropy at the Big Bang Require an Explanation? The Cosmological Arrow of Time

[Physics] Second law of thermodynamics and the arrow of time: why isn’t time considered fundamental

Related Question