Kolmogorov Complexity and Shannon Entropy of an information source have different definitions, which I assume you have read: if not, the Wiki pages on Kolmogorov Complexity, Shannon Entropy and the Shannon's source coding theorem are essential and sound background reading as well as E. T. Jaynes, "Information Theory and Statistical Mechanics" (especially the first section of this one). The Jaynes article as well as the Maths Stack Exchange answer you cite give a pretty good summary of the relationship between Shannon and Gibbs entropy: Boltzmann entropy of an ensemble of statistically independent subsystems (or a gas of particles whose states are statistically independent) is the same as the Gibbs entropy in that case (where statistical independence is true); the two differ otherwise, and the Boltzmann entropy is simply a number calculated from the particle states *as though they were statistically independent*;the Jaynes work E. T. Jaynes, "Gibbs vs. Boltzmann Entropies" talks about the difference between Boltzmann and Gibbs entropies in detail. The point is that the Boltzmann entropy is a different concept from the information theoretic point of view, but it is "accessible" (i.e. can be calculated from macroscopic measurements of a system, and as such is equivalent to Clausius's macroscopic definition of entropy) and there are quite strong qualitative arguments as to why it should "realise" second law of thermodynamics: true informational entropy *cannot* change for a deterministic system like the classical statistical mechanical gas even for irreversible changes, if the gas does not come into contact with some other system. Its state in this case wholly determines its state at all other times.

So now we look at the Kolmogorov Complexity of a system. This is the length in bits of the shortest possible description of the system *with respect to a given descriptive language*. The K-complexity is always cited with a descriptive language in mind - it cannot be otherwise, even if this is not explicitly state. Again, for the gas or system ensemble where the constituents truly are statistically independent, then the most succinct description of the constituents' states is simply one which names each state in full: the knowledge of one particle's state tells us nothing about that of the others: otherwise they wouldn't be statistically independent so the situation where the naming of all the particles' states in full is the shortest possible description can be taken as a definition of statistical independence, if you like. In this case, the Kolmogorov Complexity and the Shannon entropy approach one another as the number of particles or substates approaches infinity.

To understand why this is so in this statistically independent case, as the number $N$ of particles gets very large, the numbers of them in each possible state are very near to $p_1,\,p_2,\,\cdots$ where $p_j$ is the probability that a given particle will be in the $j^{th}$ state, and the proportional error between the actual number in state $j$ for a sample of $N$ of them and the number $p_j\,N$ approaches nought as $N\to\infty$ (see my answer to the Physics SE question "Why are the laws of thermodynamics “supreme among the laws of Nature”? where I discuss this in detail).

So this is how we shall specify the state in detail in a message. We shall need a "foreword" to our message, which tells us the number of particles $n_j$ in each state, without saying which particles are in this state. This is simply a number from 1 to N for each of a finite number $J$ of possible states. So we need

$$F_1 = J \log_2 N$$

bits to do this. Once we have the numbers $n_j$ in each state, we now have our main message naming which particles are in which state. We can easily see that the fewest number of bits is $\log_2 \Omega$, where $\Omega$ is the total number of ways of arranging $N$ particles into partitions of size $n_1,\,n_2,\,\cdots$. We have one different integer for each possible arrangements: simply naming this integer tells us exactly which arrangement we have. We can't have any fewer than the integer $\Omega$, because then the relationship between the set of all arrangements and the codewords cannot be one to one (this is simply a restatement of the pigeon hole principle). So, without knowing how to encode the arrangements, we can see there is some coding that will encode the message in $\log_2 \Omega$ bits. But now:

$$\Omega = \frac{N!}{n_1!\,n_2!\,\cdots}$$

and by the Stirling approximation we get

$$\log_e \Omega \approx N\log N - N - \sum\limits_{j=1}^J {(n_j \log_e n_j -n_j)} = -\sum\limits_{j=1}^J {n_j \log_e \frac{n_j}{N}}$$

Now we know, by the laws of large numbers, that $\log(N p_j /n_j)\to 0$ as $N\to\infty$, so in the limit of large $N$, the shortest message size in bits per particle is:

$$\frac{\log_2\Omega + F_1+F_2}{N} = \frac{F_1+F_2}{N} - \left(\sum\limits_{j=1}^J {\frac{n_j}{N} \log_2 \frac{n_j}{N}}\right)\to-\sum\limits_{j=1}^J p_j \log_2 p_j$$

bits per symbol, which is the Shannon entropy. Here we have had to add to our "foreword" a description $F_2$ bits long of how the descriptive code integer $1\cdots\Omega$ maps to the different arrangements. But this is a slowly growing overhead - it can be shown its length in bits can be arranged to grow only logarithmically with $N$. So we see that the Kolmogorov Complexity and Shannon entropy come to mean the same thing in many important cases in the thermodynamic limit as both the laws of large numbers take hold and make $n_j = N p_j$ effectively exact and also spreads the constant descriptional "overhead" of the foreword over more and more particles.

This ends the answer, but it is worth noting that you can work the above into a rigorous proof of the following Shannon's noiseless coding theorem: http://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem. If you try to code a message comprising a string of statistically independent symbols using $H−\epsilon$ bits per symbol (where H is the Shannon entropy of the symbol probability distribution) for any $\epsilon>0$ nomatter how small, then the probability of "failure" (i.e. the probability that you will "garble" the message rises to unity as the message length $\to\infty$). Contrapositively, if you choose to use $H+\epsilon$ bits per symbol, then it can be proven that there exists a coding scheme such that the probability of failure shrinks to nought as the message length rises without bound. Shannon's definition of entropy is given its "working" practical meaning through the noiseless coding theorem. Jaynes also gives statistical mechanial interpretations of the Shannon entropy in the opening sections of E. T. Jaynes, "Information Theory and Statistical Mechanics".

## Best Answer

I guess so - I mean, as far as I know, there's no law of physics that strictly prohibits those "exotic" states from being realized. As long as the state exists and can be reached by some path from the "center" of the state space where the likely states are, there should be a nonzero (not even infinitesimal, really) probability of accessing it. But for a typical system, that probability is really,

really,reallysmall. So small that it's impossible to intuitively comprehend just how unlikely such an event is.The thing is, a lot of people aren't used to dealing with even moderately large or small numbers. If you confront them with a probability like $10^{-10^{23}}$, they often fail to put the smallness of that value in perspective, and instead focus on the fact that it's not strictly equal to zero. From there they may start coming up with all sorts of nonsensical ideas about walking through walls and spontaneous combustion (the weird kind) and the like. So physicists usually find it easier to just say the probability is zero - and in fact, for any purpose other than a rigorous mathematical proof, it might as well be.

(Sorry about the rant, I know most people are actually relatively sensible about these things, but it bothers me that the crazy ones seem to get all the attention despite being wrong.)