The coefficients $P_{ik}$ can be zero in the second expression. They don't have to be nonzero just because the sum runs over them. The actual value depends on your density operator and the basis that you chose. The representations are equivalent and it is as you stated yourself, you obtain the second one by insertion of the linear expansion with respect to the other basis.
You can also switch to the the first representation based on the second by diagonalization of the nondiagonal density matrix. This is nothing but a change of basis.
You can differentiate pure and mixed states by calculation of the trace of powers of the density operator or the density matrix. The density operator of a pure state is idempotent i.e.
$$
\hat \rho \hat \rho = \hat \rho \\
tr(\hat \rho \hat \rho) = tr(\hat \rho )=1
$$
For mixed states this no longer holds and the trace of powers of rho are less than one and $$\hat \rho_{mixed} \hat \rho_{mixed} \neq \hat \rho_{mixed} $$
These properties do not depend on the representation and you can use any density operator representation to check them.
For a quick check you can also look at the matrix elements. If your density operator is based on a pure state then the following holds for the matrix elements,
$$\begin{aligned}
|c_{ij}| &= \sqrt{|c_{ii}|*|c_{jj}|}\\
c_{ij} &= c_{ji}^*
\end{aligned}$$
this means that off-diagonal elements cannot be zero if the diagonal elements aren't zero themselves for a pure state.
Density matrices are, by definition, positive semi-definite trace-class operators of unit trace, see for example this. In the following we will only discuss the finite-dimensional case, so we have to show that the reduced density matrix is positive semi-definite and of unit trace.$^\dagger$
To start, consider a complex bipartite Hilbert space $H=H_A\otimes H_B$ of finite dimension. Let $\rho$ be a density operator on $H$. In particular it holds that
$$
\langle \psi|\rho|\psi\rangle \geq 0 \tag{1}$$
for all $|\psi\rangle \in H$ and
$$\mathrm{Tr} \rho =1 \tag{2} \quad .$$
We define the reduced density operator $\rho_A$ of $\rho$ as
$$ \rho_A:=\mathrm{Tr}_B\,\rho\tag{3} := \sum\limits_{k \in K} \left( \mathbb I_A\otimes\langle \psi_k|\right) \,\rho\, \left(\mathbb I_A\otimes |\psi_k\rangle\right) \quad , $$
for some orthonormal basis $\{|\psi_k\rangle\}_{k\in K}$ in $H_B$, cf. here. Note that $\rho_A$ is a linear operator on $H_A$. We compute its trace:
$$\mathrm{Tr}^{(A)} \rho_A = \sum\limits_{j \in J}\sum\limits_{k \in K} \langle \varphi_j|\otimes \langle \psi_k| \, \rho\, |\varphi_j\rangle \otimes |\psi_k\rangle = \mathrm{Tr} \rho \overset{(2)}{=} 1 \quad , \tag{4} $$
where $\{|\varphi_j\rangle\}_{j\in J}$ and $\{|\varphi_j\rangle \otimes |\psi_k\rangle\}_{j\in J,k\in K}$ are orthonormal bases in $H_A$ and $H$, respectively. We are left to show that $\rho_A$ is positive semi-definite. To do so, define $|\phi_k\rangle := |\varphi\rangle \otimes |\psi_k\rangle$ for an arbitrary $|\varphi\rangle \in H_A$. Then
$$0 \overset{(1)}{\leq}\sum\limits_{k\in K}\langle \phi_k|\rho|\phi_k\rangle = \sum\limits_{k \in K} \langle \varphi| \otimes \langle \psi_k|\,\rho\,|\varphi\rangle \otimes |\psi_k\rangle\overset{(3)}{=}\langle \varphi| \rho_A |\varphi\rangle \tag{5} \quad . $$
In conclusion, since $\rho_A$ is a (linear) positive semi-definite operator of unit trace on $H_A$, it is a density operator.
$^\dagger$ That this generalizes your definition can be seen as follows (again, working with on a finite-dimensional complex Hilbert space):
If an operator $\sigma$ is positive semi-definite, it is hermitian (cf. the last part of this answer) and thus admits a spectral decomposition:
$$\sigma = \sum\limits_i \lambda_i\, |\lambda_i\rangle \langle \lambda_i| \quad , \tag{6}$$
with $\lambda_i \geq 0$. The trace condition now ensures that $ \sum\limits_i \lambda_i = 1$, so it is of the form given by your equation $(1)$ and hence a density matrix. Conversely, for an operator of the form
$$\sigma := \sum\limits_i p_i\, |\psi_i\rangle \langle \psi_i| \tag{7}$$
with $ \sum\limits_i p_i=1$, $p_i \geq 0$ and $|\psi_i\rangle$ normalized (but not necessarily orthogonal), i.e. a density matrix, it is not hard to prove that $\sigma \geq 0$ and $\mathrm{Tr}\, \sigma =1$.
To summarize: Proving that $\rho_A$ is positive semi-definite and of unit trace shows that it admits a representation of the form $(6)$ and thus proves that it is a density operator in terms of your definition.
Best Answer
"The system admits a wave-function description" means there is just one single state (or "wavefunction") $\lvert \psi\rangle$ the system is in with absolute certainty. So in that case, the set of your $\lvert i\rangle$ is just the single state $\lvert \psi\rangle$ and it occurs with statistical weight $w_\psi = 1$. Obviously $$ \rho = w_\psi\lvert \psi\rangle\langle \psi\rvert = \lvert \psi\rangle\langle\psi\rvert = P_\psi$$ in this case.