Canonical Ensemble – How to Derive Canonical Ensemble from Maximum Entropy Principle

density-operatorentropystatistical mechanics

Consider a quantum system with Hamiltonian $H$, described by a density operator $\rho$. It is known that the expectation value of this system is

$$
\langle H \rangle \equiv tr(\rho H) = E
$$

I want to determine the density operator $\rho$ by the principle of maximum entropy, without the need to introduce a large number imaginary copies (ensemble) of the system (this is the route followed by Pathria and Beale). Here the entropy should be the von Neumann entropy of the density operator

$$
S(\rho) = tr(-\rho \ln \rho)
$$

Then I need to minimize $-S(\rho) = \rho \ln \rho$ under the following constraints:

  • $E = tr(\rho H)$
  • $\rho$ is a density operator, i.e. it is positive definite and $tr(\rho) = 1$.

Is it possible to follow this logic and finally arrive at

$$
\rho = \frac{1}{Z} e^{-\beta H} \quad ?
$$

Here $\beta = 1/T$ is the temperature and $Z = tr(e^{-\beta H})$ is the partition function.


Up to now I tried the following: construct the Lagrange function

$$
\Lambda(\rho, \lambda)
\equiv -S(\rho)
+ \lambda_1 (tr \rho – 1)
+ \lambda_2 (tr \rho H – E)
$$

where $\lambda_i$'s are Lagrange multipliers. The minimization condition is

$$
\begin{align*}
\frac{\delta \Lambda}{\delta \rho}
&= tr \frac{\delta}{\delta \rho} (
\rho \ln \rho + \lambda_1 \rho
+ \lambda_2 \rho H
)
\\
&= tr(1 + \ln \rho + \lambda_1 + \lambda_2 H)
\overset{!}{=} 0
\end{align*}
$$

Thus

$$
\rho = \exp(-1 – \lambda_1 – \lambda_2 H)
$$

But then

  • The variation $\delta(\rho \ln \rho)/\delta \rho$ may not be a simple generalization of $d(x \ln x)/dx = 1 + \ln x$, since $\rho$ is now a linear operator (can be imagined as a matrix, and I am now varying the matrix elements).
  • I was sloppy to say that $tr(1 + \ln \rho + \lambda_1 + \lambda_2 H) = 0$ implies $1 + \ln \rho + \lambda_1 + \lambda_2 H = 0$.
  • How to incorporate the constraint that $\rho$ is positive definite?
  • How to related the Lagrange multipliers $\lambda_1, \lambda_2$ to the temperature $T = 1/\beta$? (I hope that this can be done without too much classical thermodynamics.)

Best Answer

A good way to start the proof is to first select some basis such that the density matrix $$\rho=\sum_{j}{p_j|\psi_j\rangle \langle\psi_j|}$$ where $p_j>0$ and $\{|\psi_j\rangle\}$ is orthonormal. This form can always be derived for any density matrix $\rho$ since we already assume $\rho$ to be positive-definite, which implies that it is Hermitian and diagonalizable. Also, by $\text{tr}(\rho)=1$, $\sum_{j}{p_j}=1$ as expected.

Now look back to the minimization condition you derive, $${\delta \Lambda \over \delta \rho}=\sum_{j}{\partial p_{j}(1+\log{p_j}+\lambda_1+\lambda_2 \langle\psi_j| H |\psi_j\rangle)}=0$$ Therefore, given $Z=\sum_{j}{e^{-\lambda_2 \langle\psi_j| H |\psi_j\rangle}}$, $$p_j={1 \over Z}{e^{-\lambda_2 \langle\psi_j| H |\psi_j\rangle}}$$ But let us inspect what our minimization condition means now: We set $\{|\psi_j\rangle\}$ to be some fixed basis, and by varying $p_j$ under constraints of $\sum_{j}{p_j}=1$ and energy conservation, we minimize the negative value of entropy $-S$. Therefore, there are choices of basis which can further vary $S$, so which basis can maximize $S$?

To find the answer, we recompute $S$ \begin{align} S & = -\sum_{j}{p_j\log{p_j}} \\ & = -\sum_{j}{\big(p_j(-\lambda_2 \langle\psi_j| H |\psi_j\rangle)-p_j\log{Z}\big)} \\ & = \lambda_2 E + \log{Z} \end{align} If we take a differential on both sides \begin{align} \delta S & = \delta\lambda_2 E + \delta(\log{Z}) \\ & = \delta\lambda_2 E - \delta\lambda_2\sum_{j}{\langle\psi_j| H |\psi_j\rangle e^{-\lambda_2 \langle\psi_j| H |\psi_j\rangle} \over Z} + \delta_{\{|\psi_j\rangle\}}(\log{Z}) \\ & = \delta_{\{|\psi_j\rangle\}}(\log{Z}) \end{align} where $\delta_{\{|\psi_j\rangle\}}(\log{Z})$ represents the variation of $\log{Z}$ when we vary only the basis $\{|\psi_j\rangle\}$ and keep $\lambda_2$ the same.

So we are left to maximize $Z=\sum_{j}{e^{-\lambda_2 \langle\psi_j| H |\psi_j\rangle}}$ with respect to the basis $\{|\psi_j\rangle\}$ while considering $\lambda_2$ to be a constant. Since exponential function is convex, we have the inequality (you can think it as a generalization for operator version of Jensen's inequality) $$\sum_{j}{e^{-\lambda_2 \langle\psi_j| H |\psi_j\rangle}} \leq \sum_{j}{\langle\psi_j|e^{-\lambda_2 H} |\psi_j\rangle}$$ where the equality holds when $|\psi_j\rangle$ is an eigenvector of $H$ for any $j$. Therefore, $Z$ is maximized at $Z=\sum_{j}{e^{-\lambda_2 E_j}}$ where $E_j$'s make the energy spectrum of $H$, and the density matrix at thermal equilibrium is $$\rho={1 \over Z}{e^{-\lambda_2 H}}$$ as you expect. Now let me answer the questions I have not replied:

  1. How to incorporate the constraint that $\rho$ is positive definite?

The property that a matrix is positive-definite is hard to incorporate directly in the Lagrange's method. The Lagrange's method takes care of constraints which are equalities while the condition that $p_j>0$ gives inequalities. However, the density matrix at thermal equilibrium is naturally positive-definite since we are not varying $p_j$'s with respect to some random function, we are maximizing some carefully crafted function, entropy, $S$. It turns out $p_j$'s are always bigger than $0$ no matter what $H$ and $\lambda_2$ are.

  1. How to relate the Lagrange multipliers $\lambda_1$, $\lambda_2$ to the temperature $T={1 \over \beta}$? (I hope that this can be done without too much classical thermodynamics.)

As you can see in our results, $\lambda_1$ disappears in $p_j$ and $S$ since $\lambda_1$ serves as a bias (the background energy that already exists and does not influence the statistics of our system). The more interesting part is $\lambda_2$, and let us denote $\lambda_2 = \beta$.

Historically there are two versions of entropy: The first one is the Clausius' version $\delta S_C=T \delta Q$, and the second is the Gibb's version $S_G=-\sum_{j}{p_j\log{p_j}}$ (Boltzmann's entropy about microstates is a special case of Gibb's). The first version of entropy comes with the usual temperature $T$, and the second version, as we can see previously, yields $\beta$ naturally. Here comes the crucial point: Starting with $S_C$, we can construct the first version equations for laws of thermodynamics in which $T$ is used, and similarly, a second version of equations where $\beta$ is used can be formulated. These two approaches to thermodynamics are both valid but just respectively from the macroscopic viewpoint and microscopic one. Therefore, the two versions of equations should be equivalent, and by comparison, we will have $\beta={1 \over T}$ and $\delta S_C = \delta S_G$. There is a quite explicit post about how this is done.

Related Question