[Physics] Von Neumann entropy: How do we get $S(\rho)=-\sum_i p_i \ln(p_i)$

density-operatorentropyquantum mechanicsstatistical mechanics

I would like to understand why the Von Neumann entropy can be written like this :

$$ S(\rho)=-\sum_i p_i \ln(p_i)$$

as written here: https://en.wikipedia.org/wiki/Von_Neumann_entropy

Indeed, if I want to prove it from the definition I would do:

$$ S(\rho)=-Tr(\rho \ln(\rho))=-Tr(\sum_{ij} p_i \ln(p_j)|\psi_i \rangle\langle\psi_i|\psi_j\rangle\langle\psi_j|) \\=-\sum_{kij} p_i \ln(p_j) \langle \phi_k|\psi_i\rangle\langle\psi_i|\psi_j\rangle\langle\psi_j|\phi_k\rangle \\
=-\sum_{ij} p_i \ln(p_j) |\langle \psi_i | \psi_j \rangle|^2$$

As we don't have orthogonal states for our density matrix in general, I don't see how we could end up with the formula of the wikipedia page.

Best Answer

The density operator is hermitian. This means that you can find one orthonormal basis $|\phi_i\rangle$ of eigenvectors for it. By definition this means that there are real numbers $p_i$ such that

$$\rho |\phi_i\rangle = p_i |\phi_i\rangle.$$

Now, it is known that if $f(x)$ is one ordinary function, and $A$ is one hermitian operator with orthonormal basis of eigenvectors $|a_i\rangle$ then we define $f(A)$ on this basis to be

$$f(A)|a_i\rangle=f(a_i)|a_i\rangle,$$

which in turn defines $f$ on the whole Hilbert space, since the $|a_i\rangle$ are a basis.

Now this is how $f(\rho)=\rho \ln \rho$ is defined. In the basis of $\rho$ we have

$$f(\rho)|\phi_i\rangle=p_i \ln p_i |\phi_i\rangle.$$

Now remember you can compute the trace in any basis you want. We compute it in this basis, remembering that $f$ has matrix elements:

$$\langle \phi_j |f(\rho)|\phi_i\rangle = p_i \ln p_i \delta_{ij}.$$

Thus the trace is exactly

$$S(\rho)=-\operatorname{Tr}(\rho\ln \rho)=-\sum_{i}p_i \ln p_i.$$

Now, as a remark, in my opinion things can be thought the other way around. This latter expression for entropy was known before QM, with $p_i$ being the probabilities for the microstates. In order to generalize to Quantum Mechanics, remember that $\rho$ represents one ensemble, so that when you have a mixed state, you don't actualy know the actual microstate. On the other hand $\rho$ encodes the probabilities for the microstates as those $p_i$ above.

Thus one could start with the previous knowledge of what $S$ should be in terms of these $p_i$ and arrive at a general expression involving just $\rho$. In simple terms: $S(\rho)$ is defined to yield this result.

Related Question