It seems odd that entropy is usually only defined for a system in a single 'slice' of time or spacelike region. Can one define the entropy of a system defined by a 4d region of spacetime, in such a way that yields a codimension one definition which agrees with the usual one when the codimension one slice is spacelike?

# [Physics] four-dimensional definition of entropy

entropyspecial-relativitystatistical mechanics

#### Related Solutions

First of all we must distinguish between two things that are called entropies. There's a microscopic entropy, also called Shannon Entropy, that is a functional over the possible probability distributions you can assign for a given system:

$\displaystyle H[p] = -\sum_{x \in \mathcal{X}}\; p(x) \log(p(x))$

where $\mathcal{X}$ is the set where your variable x takes values. And there's a "macroscopic entropy", that is merely the value of the functional above calculated for a specific family of distributions parametrized by some variable $\theta$:

$S(\theta)=-\sum_{x \in \mathcal{X}}\; p(x|\theta) \log(p(x|\theta))$

Now, what happens in thermodynamics and equilibrium statistical physics is that you have a specific family of distributions to substitute in the first expression: the Gibbs equilibrium distribution:

$p(x | V, T, N) = \frac{1}{Z}e^{-\frac{E(x)}{T}}$

where, as an example, we have as parameters the volume, temperature and number of particles, and E(x) is the energy of the specific configuration x. If you substitute this specific family of distributions on $H[p]$, what you'll have is the thermodynamic equilibrium entropy, and this is what physicists usually call entropy: a state function depending on parameters of the Gibbs distribution (as opposed to a functional that associate a real value for each possible choice of distributions). Now, to find what is the apropriate physical equilibrium for this system when those parameters are allowed to vary, you must maximize this entropy (1).

Now here it's common to make the following distinction: x is a microscopic variable that specify the detailed configuration of the system, and V, T and N are macroscopic parameters. It doesn't need to be so. In the specific case of statistical physics the origin of the distribution function is the fact that there are so many degrees of freedom that it's impossible (and even undesirable) to follow them all, so we are satisfied with a statistical description. Under this assumptions it's natural to expect that the distribution would be over microscopic variables with macroscopic parameters. But this is not the only reason why one would use a distribution function.

You could have other sources of ignorance. As an example, you could have the following problem: suppose we recently discovered a new planet on a solar system where there' 2 more planets. It's position $\vec{x}$ and velocity $\vec{v}$ at a given instant $t = 0$ have been measured within some precision $\sigma_x$ and $\sigma_v$. Let's assume that the sources of possible errors in the measures are additive. Then it's reasonable to assume that we have a gaussian probability distribution for the position of the planet:

$\displaystyle p(\vec{x}(0), \vec{v}(0) | \sigma_x, \sigma_v) =\frac{1}{Z} \exp\left(-\frac{x(0)^2}{2\sigma_x} -\frac{v(0)^2}{2\sigma_v} \right)$

where Z is some normalization constant. Now suppose we want to predict this planets position in the future given the current positions of the other planets and their uncertainties. We would have a distribution:

$\displaystyle p(\vec{x}(t), \vec{v}(t) | \vec{x}_i(0), \vec{v}_i(0), \sigma_{x,i},\sigma_{v,i})= \displaystyle p(\vec{x}(0), \vec{v}(0) | \sigma_x, \sigma_v)\prod_{i=1}^{2}\displaystyle p(\vec{x}_i(0), \vec{v}_i(0) | \sigma_{x,i}\sigma_{v,i}) \times$ $\times p(\vec{x}(t), \vec{v}(t) | \vec{x}(0), \vec{v}(0),\vec{x}_1(0), \vec{v}_1(0), \vec{x}_2(0), \vec{v}_2(0))$

where $p(\vec{x}(t), \vec{v}(t) | \vec{x}(0), \vec{v}(0),\vec{x}_1(0), \vec{v}_1(0), \vec{x}_2(0), \vec{v}_2(0))$ would take Newton's equations of motion into account. Note that there's a small number of particles here: just 3. And the only source of "randomness" is the fact that I don't know the positions and velocities precisely (for a technological reason, not a fundamental one: I have limited telescopes, for example).

I can substitute this distribution in the definition of entropy and calculate an "macroscopic entropy" that depends on the positions, velocities and measurement precisions of the other planets:

$S(x_i, v_i,\sigma_{x,i},\sigma_{v,i}) = - \int d\vec{x} d\vec{v} p(\vec{x}, \vec{v} | t, \vec{x}_i, \vec{v}_i, \sigma_{x,i},\sigma_{v,i}) \log \left[p(\vec{x}, \vec{v} |\vec{x}_i, \vec{v}_i, \sigma_{x,i},\sigma_{v,i})\right]$

What does this entropy means? Something quite close to what thermodynamic entropy means!!! Is the logarithm of the average configuration space volume where I expect to find the given planet in instant t (2)!!! And it's the entropy of a 'single particle'.

There's no problem with that. I can even have situations where I must maximize this entropy! Suppose I don't know the position planet 2, but I do know all three planets have coplanar orbits. There are well defined procedures in information and inference theory that say to me that one way of dealing with this is to find the value of $\vec{x}_2$ that maximizes the entropy, subject to the constraint that all orbits are in the same plane, and then substitute this value in the original distribution. This is often called "principle of maximum ignorance".

There are interpretations of thermodynamics and statistical physics as an instance of this type of inference problem ( please refer to the works of E. T. Jaynes, I'll give a list of references below). In this interpretation there's nothing special on the fact that you have many degrees of freedom besides the fact that this is what makes you ignorant about the details of the system. This ignorance is what brings probabilities, entropies and maximum entropy principles to the table.

Refrasing it a bit, probabilities and entropies are a part of your description when ignorance are built in your model. This ignorace could be a fundamental one - you can't know something about your system; could be a technical one - you could know if you had better instruments; and even, as in the case of statistical physics, a deliberate one - you can know, at least in principle, but you choose to leave detail out cause it isn't relevant in the scale you're interested in. But the details about **how** you use probabilities, entropies and maximum entropy principles are completely agnostic to what the sources of your ignorance are. They are a tool for dealing with ignorance, no matter the reasons why you are ignorant.

(1) For information-theoretic arguments why we have to maximize entropy in thermodynamics please refer to E. T. Jaynes' famous book "Probability Theory: The Logic of Science" (3) and this series of articles:

Jaynes, E. T., 1957, Information Theory and Statistical Mechanics, Phys. Rev., 106, 620

Jaynes, E. T., 1957, Information Theory and Statistical Mechanics II, Phys. Rev., 108, 171.

Another interesting source:

Caticha, A., 2008, Lectures on Probability, Entropy and Statistical Physics, arxiv:0808.0012

(2) This can be given a rigorous meaning within information theory. For any distribution p(x) let the set $A_\epsilon$ be defined as the smallest set of points with probability greater than $1 - \epsilon$. Then the size of this set must be of order:

$log |A_\epsilon| = S + O(\epsilon)$

For another form of this result see the book "Information Theory" by Cover and Thomas.

(3) Some of Jaynes's rants about quantum theory in this book may appear odd today, but let's excuse him. He committed some errors too. Just focus on the probability theory, information theory and statistical physics stuff which is quite amazing. :)

(4) It seems that dealing with this kinds of problems from Celestial Mechanics was actually one of the first problems that made Laplace interested in probabilities, and apparently he used it in calculations on Celestial Mechanics. The other problem that took his attention towards probability theory was... gambling! Hahaha...

## Best Answer

You are thinking about Boltzmann's definition of entropy, I guess?

In Boltzmann's definition, entropy is just the logarithm of the amount of possible states associated with certain macroscopic variables. In its generality, therefore, it doesn't seem to me to exclude the possibility of counting states with different time coordinates. Or in your more general context, on different time-slices. The question is, what does this correspond to? Does it make sense to do that? You would have to specify the time-development of the macroscopic variables and count the number of microscopic trajectories compatible with those macroscopic trajectories.

As a matter of fact, there exist so-called dynamical entropies. In a heuristic sense, what they do is counting the density of phase-space trajectories of a system, whereas Boltzmann entropy just counts the amount of accessible states under certain macroscopic constraints.

http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Sinai_entropy#Measure-theoretic_entropy