For a pure state, by definition,
$$\rho = |\psi\rangle\langle \psi| $$
So it is a projection operator onto the pure state $|\psi\rangle$. Note that ${\rm Tr}(\rho L)=\langle\psi|L|\psi\rangle$ for this density matrix. So it follows that
$$\rho^2 = |\psi\rangle\langle \psi|\psi\rangle\langle \psi|=|\psi\rangle\langle \psi|=\rho $$
and ${\rm Tr}(\rho^2)=1$ follows from the usual normalization conditions for the overall probability ${\rm Tr}(\rho)=\langle\psi|\psi\rangle=1$.

I think it is a mistake, in this case, to think of entropy as "a description of our ignorance." Rather, I would suggest that you think of entropy as a well-defined, objective property *provided* that you specify which degrees of freedom in the universe are inside and outside of your system. The content of this statement isn't really different, but it emphasizes that entropy is an objective property and not observer-dependent.

If your included list is "everything" (or at least everything that has ever interacted together in the history of your system), then what you said is true: if you started out with a pure state it will always remain so, and there isn't much thermodynamics to speak of.

The basic question of thermodynamics (and, more broadly, statistical mechanics) is what happens in *any other case* - most typically, the case in which the degrees of freedom you specify are continuously coupled to an open system in some way. Somewhat amazingly, there is a general answer to this question for many such arrangements.

More concretely, in classical thermodynamics one of the important things about entropy and temperature is that they tell you how much work you can extract of a system. So one way to reform your question is: "How can properties like maximum work extracted come out of ignorance of information?" But it is easy to think of situations when this is the case. As a toy model, imagine a sailor trying to navigate a sailboat in some storm, with the wind changing wildly and rapidly. If he somehow knows beforehand exactly when and how the wind will shift, he will have a much easier time moving in the direction he wants.

Ultimately, a similar game is being played on a microscopic level when one speaks, for example, of the maximum efficiency possible in a heat engine. The explicit connection is made by Landauer's Principle, which is the direct link you're looking for between the included degrees of freedom (or, if you insist, "knowledge") and work. This law was inspired by the famous thought experiment Maxwell's Demon, which is a microscopic equivalent to my weather-predicting sailor.

## Best Answer

Since an arbitrary $\rho$ is self-adjoint, it has the spectral decomposition $\rho = \sum_n \rho_n |\psi_n><\psi_n|$, in terms of an orthonormal basis $\{ |\psi_n>\}$, which here we pick discrete for simplicity.

Hermiticity implies $\rho_n = \rho^*_n$. $\mathrm{Tr} \rho = 1$ implies $\sum_n \rho_n =1$. Semi-positivity implies $0 \leq \rho_n$. Together they imply $0 \leq \rho_n \leq 1$, which implies $\rho^2_n \leq \rho_n$. Hence, $\mathrm{Tr} \rho^2 = \sum_n \rho^2_n \leq \sum_n \rho_n = 1$ and so $\mathrm{Tr} \rho^2 \leq 1$ for a generic state, as you mention.

Now let's start assuming that $\mathrm{Tr} \rho^2 =1$. Following the inequalities we just wrote, this implies that $\rho^2_n = \rho_n$ for all $n$ i.e. $\rho^2 = \rho$. In particular this implies that $\rho_n = 1$ or $\rho_n = 0$. More precisely, due to the trace condition $\mathrm{Tr} \rho = 1$, only one $\rho_n$ is equal to one while the others vanish. This is a pure state.

The reverse implication is direct.