Yes, the density matrix reconciles all quantum aspects of the probabilities with the classical aspect of the probabilities so that these two "parts" can no longer be separated in any invariant way.
As the OP states in the discussion, the same density matrix may be prepared in numerous ways. One of them may look more "classical" – e.g. the method following the simple diagonalization from equation 1 – and another one may look more quantum, depending on states that are not orthogonal and/or that interfere with each other – like equations 2.
But all predictions may be written in terms of the density matrix. For example, the probability that we will observe the property given by the projection operator $P_B$ is
$$ {\rm Prob}_B = {\rm Tr}(\rho P_B) $$
So whatever procedure produced $P_B$ will always yield the same probabilities for anything.
Unlike other users, I do think that this observation by the OP has a nontrivial content, at least at the philosophical level. In a sense, it implies that the density matrix with its probabilistic interpretation should be interpreted exactly in the same way as the phase space distribution function in statistical physics – and the "quantum portion" of the probabilities inevitably arise out of this generalization because the matrices don't commute with each other.
Another way to phrase the same interpretation: In classical physics, everyone agrees that we may have an incomplete knowledge about a physical system and use the phase space probability distribution to quantify that. Now, if we also agree that probabilities of different, mutually excluding states (eigenstates of the density matrix) may be calculated as eigenvalues of the density matrix, and if we assume that there is a smooth formula for probabilities of some properties, then it also follows that even pure states – whose density matrices have eigenvalues $1,0,0,0,\dots$ – must imply probabilistic predictions for most quantities. Except for observables' or matrices' nonzero commutator, the interference-related quantum probabilities are no different and no "weirder" than the classical probabilities related to the incomplete knowledge.
In this post all manipulations are at a rather formal level and no deep mathematical discussion takes places. For that see the references at the end.
To start, consider a bipartite Hilbert space $H=H_A\otimes H_B$ and define the operator $\mathbb I_A \otimes |\psi\rangle : H_A\longrightarrow H $ for some $|\psi\rangle \in H_B$ and the identity operator $\mathbb I_A$ on $H_A$, such that for $|\varphi\rangle \in H_A$ it holds that
$$ \left(\mathbb I_A \otimes |\psi\rangle \right) |\varphi\rangle := |\varphi\rangle \otimes |\psi\rangle \quad . \tag{1}$$
Its adjoint is the following operator $\left(\mathbb I_A \otimes \langle \psi| \right) : H \longrightarrow H_A$, where for $|\varphi\rangle \in H_A, |\phi\rangle \in H_B$ we have
$$ \left(\mathbb I_A \otimes \langle \psi| \right) (|\varphi\rangle \otimes|\phi\rangle) = |\varphi\rangle \langle \psi|\phi \rangle \quad . \tag{2}$$
For a density operator $\rho$ on $H$ we define the partial trace of $\rho$ as
$$\rho_A:=\mathrm{Tr_B}\,\rho := \sum\limits_{k \in K} \left(\mathbb I_A \otimes \langle \psi_k| \right) \rho\left(\mathbb I_A \otimes |\psi_k\rangle \right) \quad , \tag{3}$$
where $\{|\psi_k\rangle\}_{k\in K}$ for an index set $K$ is an orthonormal basis in $H_B$. Note that $\rho_A$ is a linear operator on $H_A$. We can similarly define an operator $\rho_B:=\mathrm{Tr}_A\,\rho$.
Instead of $(3)$ people often write, as a shorthand, something like
$$\rho_A= \sum\limits_{k \in K} \langle \psi_k| \rho |\psi_k\rangle \quad , \tag{4} $$
which can indeed be confusing. Regarding your second concern: I agree (cf. the calculations below), but I guess this could also be a matter of definition and convention, although I've not seen this before. In the end, something like $\langle \varphi|\otimes \langle \psi|$ should denote an element of $H^*$ with $ \langle \varphi|\otimes \langle \psi| \left(|\alpha\rangle \otimes |\beta\rangle\right) = \langle \varphi|\alpha\rangle \langle \psi|\beta\rangle$, for $|\varphi\rangle,|\alpha\rangle \in H_A$ and $|\psi\rangle, |\beta\rangle \in H_B$. So if we put elements of $H_A$ in the first slot of $\otimes$, i.e. on the left of side the tensor product symbol, then it seems natural to me to write $\langle \varphi|\otimes \langle \psi| \in H^*$ instead of $\langle \psi|\otimes \langle \varphi|$.
To proceed, let us now do the explicit calculations: For an operator $O_A$ on $H_A$ we compute
\begin{align}
\mathrm{Tr^{(A)}} \rho_A\,O_A &= \sum\limits_{j\in J} \langle \varphi_j | \rho_A \, O_A|\varphi_j\rangle\\
&= \sum\limits_{j\in J} \langle \varphi_j | \sum\limits_{k \in K} \left(\mathbb I_A \otimes \langle \psi_k| \right) \rho\left(\mathbb I_A \otimes |\psi_k\rangle \right) \, O_A |\varphi_j\rangle \\
&=\sum\limits_{j\in J}\langle \varphi_j | \sum\limits_{k \in K} \left(\mathbb I_A \otimes \langle \psi_k| \right) \rho\, O_A |\varphi_j\rangle \otimes |\psi_k\rangle\tag{5}\\
&=\sum\limits_{j\in J}\sum\limits_{k\in K}\langle \varphi_j|\otimes \langle \psi_k| \,\rho\,O \,|\varphi_j\rangle \otimes |\psi_k\rangle \\
&= \mathrm{Tr}\rho\, O \quad .
\end{align}
Here, $\{|\varphi_j\rangle\}_{j\in J}$ denotes an orthonormal basis in $H_A$, $\mathrm{Tr}^{(A)}$ the trace operation on $H_A$ and $O:= O_A\otimes \mathbb I_B$.
For a more rigorous treatment, see for example S. Attal. Tensor products and partial traces. Lecture Notes, especially section $2.3$. or Michael M. Wolf. Mathematical Introduction to
Quantum Information Processing. Lecture notes, especially theorem 1.35. You can find a pdf for the first reference here and for the second here.
Best Answer
The "classical" portion of the state should be unchanged by complete dephasing. This means that it will never have off-diagonal components in some particular basis. When a state is both q-c and c-q, there are a lot of off-diagonal components that must vanish! Such a state must satisfy $$(\langle \phi_k|\otimes \langle \psi_i |)\rho (| \phi_l\rangle\otimes | \psi_j \rangle)=0$$ when either $k\neq l$ or $i\neq j$. This immediately constrains the state to be c-c.
Take $k=l$. We find that $p_k\langle \psi_i|\rho_k^B|\psi_j\rangle=0$ for all $i\neq j$ by using the c-q expression and our above condition. This means $\rho_k^B$ must be diagonal in the $\{|\psi_j\rangle\}$ basis for all $k$, letting us write it as $$\rho_k^B=\sum_j\lambda_j^{(k)} |\psi_j\rangle\langle\psi_j|.$$ The overall state then becomes, again using the c-q expression: $$\rho=\sum_k\sum_j p_k\lambda_j^{(k)} |\phi_k\rangle\langle \phi_k|\otimes|\psi_j\rangle\langle \psi_j|.$$ This is a c-c state, just with each state $|\phi_k\rangle\langle \phi_k|$ and $|\psi_j\rangle\langle \psi_j|$ repeated multiple times in the expansion (actually every combination of the two is present, although the coefficients may vanish). Similar results can be found by starting from the q-c expression.