Yes, the density matrix reconciles all quantum aspects of the probabilities with the classical aspect of the probabilities so that these two "parts" can no longer be separated in any invariant way.
As the OP states in the discussion, the same density matrix may be prepared in numerous ways. One of them may look more "classical" – e.g. the method following the simple diagonalization from equation 1 – and another one may look more quantum, depending on states that are not orthogonal and/or that interfere with each other – like equations 2.
But all predictions may be written in terms of the density matrix. For example, the probability that we will observe the property given by the projection operator $P_B$ is
$$ {\rm Prob}_B = {\rm Tr}(\rho P_B) $$
So whatever procedure produced $P_B$ will always yield the same probabilities for anything.
Unlike other users, I do think that this observation by the OP has a nontrivial content, at least at the philosophical level. In a sense, it implies that the density matrix with its probabilistic interpretation should be interpreted exactly in the same way as the phase space distribution function in statistical physics – and the "quantum portion" of the probabilities inevitably arise out of this generalization because the matrices don't commute with each other.
Another way to phrase the same interpretation: In classical physics, everyone agrees that we may have an incomplete knowledge about a physical system and use the phase space probability distribution to quantify that. Now, if we also agree that probabilities of different, mutually excluding states (eigenstates of the density matrix) may be calculated as eigenvalues of the density matrix, and if we assume that there is a smooth formula for probabilities of some properties, then it also follows that even pure states – whose density matrices have eigenvalues $1,0,0,0,\dots$ – must imply probabilistic predictions for most quantities. Except for observables' or matrices' nonzero commutator, the interference-related quantum probabilities are no different and no "weirder" than the classical probabilities related to the incomplete knowledge.
On the other hand, the basis kets themselves are pure states, and the probability of observing $| \psi_i \rangle$ is $|c_i|^2$, so we should be able to express the density operator in terms of these states...
This statement is where your thinking went wrong. Your first density matrix describes a pure state in superposition, while your second density matrix describes a mixed state. There is a crucial difference between quantum superposition (the real state you are dealing with) and mixed state (the state that you THINK the real state is equivalent to).
Take a simple two-state system. A superposition is of the form:
$$| \psi \rangle = c_0 | 0 \rangle +c_1 | 1 \rangle$$
You know the state FOR SURE! But when you measure it, you still get probabilistic outcomes.
For a mixed state, you don't know what the state is! Say you have a hundred particles of which $|c_0|^2 \cdot 100$ are in the state $| 0 \rangle$ and $|c_1|^2 \cdot 100$ are in $| 1 \rangle$! Now you randomly take a particle from this collection. Sure, it is true that "the probability of observing $| 0 \rangle$ is $|c_0|^2$ and "the probability of observing $| 1 \rangle$ is $|c_1|^2$. But you are really talking about your ignorance of the system, rather than any inherent uncertainty upon measurement. The mixed state is fundamentally about classical randomness.
P.S. The off-diagonal terms signal the presence of quantum interference in the system. In case you are curious, quantum decoherence actually steers your first density matrix toward your second one when your particle is exposed to an open system. I will just provide one reference: http://vvkuz.ru/books/zurek.pdf
Search on SE and Google for more details!
Best Answer
Firstly, what is a state?
A state gives you the complete description of a system. Let's label the state of a system $\lvert \psi \rangle$. This is a normalised state vector which belongs in the vector space of states. Keep in mind that we are talking about the full state; I haven't decomposed it into basis states, and I will not. This is not what the density matrix is all about.
The state vector description is a powerful one, but it is not the most general. There are some quantum experiments for which no single state vector can give a complete description. These are experiments that have additional randomness or uncertainty, which might mean that either state $\lvert \psi_1⟩$ or $\lvert \psi_2⟩$ is prepared. These additional randomness or uncertainties arise from imperfect devices used in experiments, which inevitably introduce this classical randomness, or they could arise from correlations of states due to quantum entanglement.
In this case, then, it is convenient to introduce the density matrix formalism. Since in quantum mechanics all we calculate are expectation values, how would you go about calculating the expectation value of an experiment where in addition to having intrinsic quantum mechanical randomness you also have this classical randomness arising from imperfections in your experiment?
Recall that $$Tr(\lvert\phi_1\rangle\langle\phi_2\rvert)=Tr(\lvert\phi_1\rangle\otimes\langle\phi_2\rvert)=\langle\phi_1\mid\phi_2\rangle,$$ and $$\hat{O}\circ(\lvert\phi_1\rangle\langle\phi_2\rvert)=(\hat{O}\lvert\phi_1\rangle)\otimes\langle\phi_2\rvert$$
Now, using the linearity of the trace, we can compute the expectation value as:
$$ \langle \hat{O} \rangle = p_1\langle \psi_1 \lvert \hat{O} \lvert \psi_1 \rangle + p_2\langle \psi_2 \lvert \hat{O} \lvert \psi_2 \rangle$$ $${} = p_1Tr(\hat{O} \lvert \psi_1 \rangle \langle \psi_1 \lvert) + p_2Tr(\hat{O} \lvert \psi_2 \rangle \langle \psi_2 \lvert) $$ $${} =Tr(\hat{O} (p_1 \lvert \psi_1 \rangle \langle \psi_1 \lvert) + p_2 \lvert \psi_2 \rangle \langle \psi_2 \lvert)) = Tr(\hat{O} \rho)$$
where $p_1$ and $p_2$ are the corresponding classical probabilities of each state being prepared, and $\rho$ is what we call the density matrix (aka density operator): it contains all the information needed to calculate any expectation value for the experiment.
So your suggestion 1 is correct, but suggestion 2 is not, as this is not a superposition. The system is definitely in one state; we just don't know which one due to a classical probability.