(i) Is the above characterisation of the motivation for Tsallis entropy correct, or are there cases where the parts of a system can be statistically independent and yet we still need a non-extensive entropy?
The one example I can think of that fits this description is a collisionless plasma (well, at least weakly collisional), like the solar wind.
Over scales larger than the Debye length, the system behaves in a collective manner but the collisionless nature of the gas keeps it from reaching equilibrium. Further, even though electromagnetic fields produce long-range interactions, the "parts" of the system (e.g., Debye spheres) can still be statistically independent. This allows a collisionless plasma to behave according to a non-extensive kinetic theory.
(ii) What is the current consensus on the validity of Tsallis entropy-based approaches to statistical mechanics? I know that it's been the subject of debate in the past, but Wikipedia seems to imply that this is now settled and the idea is now widely accepted. I'd like to know how true this is.
I think the validity of Tsallis entropy is generally accepted, at least in space plasma physics [e.g., see Livadiotis, 2015]. The support for a non-Maxwell-Boltzmann theory arose because of the continual observation of velocity distributions (e.g., Maxwellian) that had power-law tails and the lack of observations of Maxwellians. Initial attempts to model these distributions included superpositions of modified Lorentzian distributions (e.g., similar to Cauchy distributions) with Maxwellians [e.g., Feldman et al., 1983; Thomsen et al., 1983]. Later studies [e.g., Maksimovic et al., 1997] resurrected an old form called a kappa distribution, which was originally derived by Vasyliunas [1968]. Eventually, Leubner [2002] showed the connection between the kappa distribution and the Tsallis distribution when $\kappa = -1/\left( q - 1 \right)$, where $q$ is the entropic parameter from Tsallis statistics (Note that the kappa distribution is a member of the modified Lorentzian distributions).
More recently, a great deal of work has started to solidify the relationship between kappa distributions and Tsallis statistics and fundamental thermodynamics. In recent years a lot of work on this topic has been published that attempts to merge the more traditional statistical mechanics with non-extensive statistical mechanics [e.g., Livadiotis, 2015; Treumann and Baumjohann, 2014, 2016].
While there is still some hesitation by some in the community, the fact that nearly all particle velocity distributions observed to date in collisionless space plasmas can be modeled by kappa distributions more accurately than Maxwellians is strong support for Tsallis statistics.
Finally, (iii) can the argument I sketched above be found in the literature? I had a quick look at some dissenting opinions about Tsallis entropy, but surprisingly I didn't immediately see the point about mutual information and the non-extensivity of Gibbs-Shannon entropy.
The long-range interactions and the collisionless nature of some plasmas causes these systems to continually be in a state of non-equilibrium. This type of system requires a non-extensive formalism, as Leubner [2002] states:
Any extensive formalism fails whenever a physical system includes long-range forces or long-range memory. In particular, this situation is usually found in astrophysical environments and plasma physics where, for example, the range of interactions is comparable to the size of the system considered. A generalized entropy is required to possess the usual properties of positivity, equiprobability, concavity and irreversibility but suitably extending the standard additivity to nonextensivity...
References
- Feldman, W.C., et al., "Electron Velocity Distributions Near the Earth's Bow Shock," Journal of Geophysical Research 88(A1), pp. 96--110, doi:10.1029/JA088iA01p00096, 1983.
- Leubner, M.P. "A Nonextensive Entropy Approach to Kappa-Distributions," Astrophysics and Space Science 282(3), pp. 573--579, doi:10.1023/A:1020990413487, 2002.
- Livadiotis, G. "Introduction to special section on Origins and Properties of Kappa Distributions: Statistical Background and Properties of Kappa Distributions in Space Plasmas," Journal of Geophysical Research: Space Physics 120(3), pp. 1607--1619, doi:10.1002/2014JA020825, 2015.
- Maksimovic, M., et al., "Ulysses electron distributions fitted with Kappa functions," Geophysical Research Letters 24(9), pp. 1151--1154, doi:10.1029/97GL00992, 1997.
- Thomsen, M.F., et al., "Stability of Electron Distributions Within the Earth's Bow Shock," Journal of Geophysical Research 88(A4), pp. 3035--3045, doi:10.1029/JA088iA04p03035, 1983.
- Treumann, R.A. and W. Baumjohann "Beyond Gibbs-Boltzmann-Shannon: general entropies—the Gibbs-Lorentzian example," Frontiers in Physics 2(49), pp. 1--5, doi:10.3389/fphy.2014.00049, 2014.
- Treumann, R.A. and W. Baumjohann "Generalised partition functions: inferences on phase space distributions," Annales Geophysicae 34(6), pp. 557--564, doi:10.5194/angeo-34-557-2016, 2016.
- Vasyliunas, V.M. "A survey of low-energy electrons in the evening sector of the magnetosphere with OGO 1 and OGO 3," Journal of Geophysical Research 73(9), pp. 2839--2884, doi:10.1029/JA073i009p02839, 1968.
The first thought to come to mind upon reading this is the Bekenstein-Hawking entropy of a black hole, which relates the entropy of a black hole to the area of its event horizon (which is in turn defined by its mass/energy). If we want to connect this black hole entropy to information, some people have argued that this arises from quantum entanglement. My knowledge of this is rough, but the linked arXiv article may help.
I link entanglement with information, because when thinking about entangled quantum systems, their entanglement gives us information about the system or lack thereof.
However, we can also take a classical thermodynamic / statistical mechanic interpretation to the question. In this case we have the quantities of entropy $S$ and internal energy $U$, which can be related (via the first law): $dU = TdS + dW$. This version of entropy is related to the Shannon entropy by a constant factor ($k_B/\ln{2}$?) as $S=k_B\ln{\Omega}$, where $\Omega$ is all possible states that the system can be in. If we want to separate matter from mass, we might talk about individual particles (eg those that make up a gas), in which case their multiplicity (amount of matter) determines the entropy of the full system $S$. $S$ is a measure of how well we know the states of each unit of matter. There's also a competition between entropy $S$ and internal energy $U$, where at low temperatures the system will be in a state where $U$ is minimized but at high temperatures $S$ is minimized.
Thermodynamics is well defined only in equilibria, but let us consider a system that moves arbitraily from state A to state B. There is a free energy difference between states A and B. The work involved to move the state from A to B can be greater than the free energy difference (eg due to friction). By rewriting the first law, we see that this work lost ("dissipated") must be accounted for by an increase in entropy. So you could equate the work $W$ of driving a process to an increase in uncertainty $S$. But in such dynamic processes, the work of defining entropy is an area of active study, so we need to be careful in our words.
But all of this depends on context. What is the energy or information of interest? In terms of your two direct questions, we see from the classical thermodynamic description that the entropy $S$ is indeed related to average number of bits needed to describe the system (Shannon entropy). The second on a restriction of information density can be implied from Bekenstein-Hawking entropy / entanglement entropy of a black hole (As the size of the black hole is the limit).
Best Answer
Boltzmann's entropy formula can be derived from the Shannon entropy formula when all states are equally probable.
Say you have $W$ microstates equiprobable with probability $p_i=1/W$. Then:
$S=-k\sum{p_i \ln p_i}=k\sum{ (\ln W)/W}=k\ln W$
Another way where this result can be obtained is maximising $S$ given that $\sum{p_i}=1$ using Lagrange multipliers:
$\max_{p_i}(S)= -k\sum{p_i \ln p_i} - \lambda(\sum{p_i}-1)$
Adding more constraints will result in a lower entropy distribution (such as the canonical entropy when adding the energy constraint and the grandcanonical when adding energy and particle constraints).
As a side note, it can also be shown that the Boltzmann entropy is an upperbound to the entropy that a system can have for a fixed number of microstates meaning:
$S\leq k \ln W$
This can also be interpreted as the uniform distribution being the distribution that provides the highest entropy (or least information, if you want someone was kind enough to prove this for me here https://math.stackexchange.com/questions/2748388/proving-that-shannon-entropy-is-maximal-for-the-uniform-distribution-using-conve).