Several posts and my classes in thermodynamics equate increase in entropy with loss of information. Shannon clearly showed that the information content of a message is zero when its entropy is zero and that its information content increases with increasing entropy. So entropy increase leads to more information, which is consistent with the evolution of the universe from a disordered plasma to one that contains lots of order. Why does physics continue to get the relationship between entropy and information backwards?
[Physics] Entropy and Information
entropyinformation
Related Solutions
Ultimate physical motivation
Strictly in the sense of physics, the entropy is less free than it might seem. It always has to provide a measure of energy released from a system not graspable by macroscopic parameters. I.e. it has to be subject to the relation $${\rm d}U = {\rm d}E_\text{macro} + T {\rm d} S$$ It has to carry all the forms of energy that cannot be expressible macroscopically, which we summarize as "heat" but the actual physics behind this "heat" might be quite different from the notions in gases etc. If entropy does not satisfy this relation, it is not a physical entropy. This would be a full characterization of entropy for macrophysics. I am going to use only this definition, not the cases where entropy is a handle to talk about information.
Statistical formulation
This constraint indeed does provide some freedom for the statistical definition of entropy, but not in effect. The freedom is basically in the fact that we are doing the $N\to \infty$ and $V \to \infty$ limits and a lot of information from the definition gets smeared out. We can for example define the phase space volume of the microcanonical ensemble in three distinct ways. First one is $$\Omega_\text{sharp} = \int_{\sum E = U} d \mu$$ Where $\mu$ is some kind of measure over the space of states. Or we can put $$\Omega_\text{non-sharp} = \int_{\sum E \in (U-\varepsilon,U)} d \mu$$ or even $$\Omega_\text{nobody cares} = \int_{\sum E < U} d \mu$$ Any of these will work for $S = k_B \log \Omega$ in the mentioned limits (the limit will give the same $S$). But this is more of a relict of the large limits - the physically plausible option is $\Omega_\text{sharp}$.
The much more important issue is counting the number of relevant states, the transition from discrete states to continuous ones and why we should consider them "democratic". This would be a very long argument involving ergodicity and so on.
For ergodic Hamiltonian systems, the probability measure is certainly proportional to $d^n x d^np$ where $n$ is the number of degrees of freedom. From quantum mechanics we know, that the "democracy" factor of discrete to continuous states makes this measure $d^n x d^np/h$ with $h$ the Planck constant. (Only the relative weights matter, since we normalize anyways.)
The conclusion is that the procedures of statistical physics, for a given system, can give us entropy unambiguously (up to an additive constant representing the freedom of state normalization).
Hand waivy conclusion
So there always is one entropy for every situation and we know how to derive it. The trick is only to specify which degrees are "free" or getting randomized in a complicated interaction and turn on the statistics.
But there are some loopholes. We see that the justification of the whole procedure (the "democratization" of states) relies on the Hamiltonian formulation and basically also quantization. But we know quantization is more of an art than a science and the statistical procedure can run into very similar problems as quantization. Are we always sure what the macroscopic parameters of a system are? How do we describe the situation when we observe the microstate directly? What would be the entropy of a relativistic space-time? Which would be the "activated" degrees of freedom? Etc. But this is a question for the "art of physics".
Additional note: "Art of physics" - modelling and confirming
A brief comment on "the art of physics". As with any physical models and approximations, there are three criteria:
- Foundation on (more) elementary physics
- Self-consistence of result with assumption
- Empirical verification
Say we have an open system $\Xi$ with a channel of particle inflow. However, we only know how to compute the parameters relevant for the inflow for small number densities in $\Xi$, because then we can use a one-particle model of entrance and leaving from the system. The one-particle model would be the point 1. - foundation on physics believed to be fundamental. We thus presume low number density and compute the statistics of the system.
But this is where the theorist's work should not stop, the last step is to check whether the density is sufficiently low under any choice of parameters and identify these regions in parameter space - this is point 2. However, this is a very primitive conception. For a serious model, the theorist should at least check whether two and higher particle models of inflow cannot suddenly take over even at low densities and investigate under what conditions they do not. This is 1. mixing with 2.
Nevertheless, there is also 3. - the empirical verification. It would be very naïve to pretend that the theorist is able to anticipate all the possible effects. In fact, Einstein's papers are well known to just shoot out a model without long mathematical discussions of neglected effects, and give experimental predictions right away. Sometimes, intuition rules (sometimes it also does not).
In the case of entropy this would be achieved by measuring the heat response of the system. It's not only heat capacities in the form $$C_{...} \sim \left(\frac{\partial S}{\partial T}\right)_{| ...\;=\text{const}.} $$ but also a lot of other response coefficient involving temperature as specified e.g. by the Maxwell relations.
So the answer would be: If a well developed model predicting quantitatively the entropy exists and it is confirmed by thorough testing, the entropy qualifies as the unique entropy of the system.
Additional note: Observed mathematical conditions
Let's say our the physical motivation is paramount. Then the strongest we can say is the following:
- Entropy is a single-valued function of the full set of macroscopic parameters. (I.e. if it is not it might also be because the list of parameters is not complete.)
- Entropy has a finite difference between any two points in the macro parameter space. I.e. $|\Delta S|<\infty$.
- Entropy is homogeneous in the parameters defined by physical criteria as "extensive". I.e. for a complete set of extensive parameter $A_i$ we have $S(\lambda A_1, ...,\lambda A_n, ...) = \lambda S(A_1,...,A_n,...), \forall \lambda <0$.
In phase transitions as common as freezing/melting entropy is even discontinuous thus the criterion. (But this happens only in the $N \to \infty$ limit as discussed e.g. by Kardar in his notes.) Physically we are able to measure only $\Delta S$ so a strict requirement of well defined $dS$ is both redundant and impossible for some very common systems.
It is important that the "extensivity" is just saying "take another copy of the system" - the parameters which double by this operation are extensive but so is also heat stored in the new "double" system. Taking all the extensive parameters and multiplying by $\lambda$ just means "taking $\lambda$ copies of the system". This all relies heavily on the fact that we are able to very clearly identify the physical operation of "taking another copy of the system".
There are cases such as Black hole thermodynamics where such a notion fails. In a way, the whole space-time is the thermodynamical system, so "take another copy of the system" is hard to specify. (More technically, the formulas are for isolated black holes and there is no way to screen out gravity otherwise than by infinite distance.) It might seem that the horizon surface $A$ would be an extensive parameter but it actually grows as $\sim M^2$ - we cannot just say "double the mass" because that would not work.
Best Answer
You have to be careful when thinking about this. For example, you talk about "the entropy of a message", but what could that mean? Shannon's entropy is a property of a probability distribution, but a message isn't a probability distribution, so a message does not in itself have an entropy.
The entropy only comes in when you don't know which message will be sent. For example: suppose you ask me a question to which the possible answers are "yes" and "no", and you have no idea what my answer will be. Because you don't know the answer, you can use a probability distribution: $p(\text{yes})=p(\text{no})=1/2,$ which has an entropy of one bit. Thus when I give my answer, you receive one bit of information. On the other hand, if you ask me a question to which you already know the answer, my reply gives you no information. You can see this by noting that the probability distribution $p(\text{yes})=1; \,\,p(\text{no})=0$ has an entropy of zero.
Now, in these examples the entropy is equal to the information gained - but in a sense they are equal and opposite. Before you receive the message there is entropy, but afterwords there is none. (If you ask the same question twice you will not receive any more information.) The entropy represents your uncertainty, or lack of information about the message, before you receive it, and this is precisely why it is equal to the amount of information that you gain when you do receive the message.
In physics it is the same. The physical entropy represents a lack of information about a system's microscopic state. It is equal to the amount of information you would gain if you were to suddenly become aware of the precise position and velocity of every particle in the system* --- but in physics there is no way that can happen. Measuring a system can give us at most a few billions of bits (usually far fewer), but the entropy of a macroscopically sized system is a lot larger than this, of the order $10^{23}$ bits or more.
The second law of thermodynamics arises because there are a lot of ways we can lose information about a system, for example if the motions of its particles become correlated with the motions of particles in its surroundings. This increases our uncertainty about the system, i.e. its entropy. But the only way its entropy can decrease is if we make a measurement, and this decrease in entropy is typically so small it can be neglected.
If you would like to have a deep understanding of the relationship between Shannon entropy and thermodynamics, it is highly recommended that you read this long but awesome paper by Edwin Jaynes.
* or, if we're thinking in terms of quantum mechanics rather than classical mechanics, it's the amount of information you would gain if you made a measurement such that the system was put into a pure state after the measurement.