In this post all manipulations are at a rather formal level and no deep mathematical discussion takes places. For that see the references at the end.
To start, consider a bipartite Hilbert space $H=H_A\otimes H_B$ and define the operator $\mathbb I_A \otimes |\psi\rangle : H_A\longrightarrow H $ for some $|\psi\rangle \in H_B$ and the identity operator $\mathbb I_A$ on $H_A$, such that for $|\varphi\rangle \in H_A$ it holds that
$$ \left(\mathbb I_A \otimes |\psi\rangle \right) |\varphi\rangle := |\varphi\rangle \otimes |\psi\rangle \quad . \tag{1}$$
Its adjoint is the following operator $\left(\mathbb I_A \otimes \langle \psi| \right) : H \longrightarrow H_A$, where for $|\varphi\rangle \in H_A, |\phi\rangle \in H_B$ we have
$$ \left(\mathbb I_A \otimes \langle \psi| \right) (|\varphi\rangle \otimes|\phi\rangle) = |\varphi\rangle \langle \psi|\phi \rangle \quad . \tag{2}$$
For a density operator $\rho$ on $H$ we define the partial trace of $\rho$ as
$$\rho_A:=\mathrm{Tr_B}\,\rho := \sum\limits_{k \in K} \left(\mathbb I_A \otimes \langle \psi_k| \right) \rho\left(\mathbb I_A \otimes |\psi_k\rangle \right) \quad , \tag{3}$$
where $\{|\psi_k\rangle\}_{k\in K}$ for an index set $K$ is an orthonormal basis in $H_B$. Note that $\rho_A$ is a linear operator on $H_A$. We can similarly define an operator $\rho_B:=\mathrm{Tr}_A\,\rho$.
Instead of $(3)$ people often write, as a shorthand, something like
$$\rho_A= \sum\limits_{k \in K} \langle \psi_k| \rho |\psi_k\rangle \quad , \tag{4} $$
which can indeed be confusing. Regarding your second concern: I agree (cf. the calculations below), but I guess this could also be a matter of definition and convention, although I've not seen this before. In the end, something like $\langle \varphi|\otimes \langle \psi|$ should denote an element of $H^*$ with $ \langle \varphi|\otimes \langle \psi| \left(|\alpha\rangle \otimes |\beta\rangle\right) = \langle \varphi|\alpha\rangle \langle \psi|\beta\rangle$, for $|\varphi\rangle,|\alpha\rangle \in H_A$ and $|\psi\rangle, |\beta\rangle \in H_B$. So if we put elements of $H_A$ in the first slot of $\otimes$, i.e. on the left of side the tensor product symbol, then it seems natural to me to write $\langle \varphi|\otimes \langle \psi| \in H^*$ instead of $\langle \psi|\otimes \langle \varphi|$.
To proceed, let us now do the explicit calculations: For an operator $O_A$ on $H_A$ we compute
\begin{align}
\mathrm{Tr^{(A)}} \rho_A\,O_A &= \sum\limits_{j\in J} \langle \varphi_j | \rho_A \, O_A|\varphi_j\rangle\\
&= \sum\limits_{j\in J} \langle \varphi_j | \sum\limits_{k \in K} \left(\mathbb I_A \otimes \langle \psi_k| \right) \rho\left(\mathbb I_A \otimes |\psi_k\rangle \right) \, O_A |\varphi_j\rangle \\
&=\sum\limits_{j\in J}\langle \varphi_j | \sum\limits_{k \in K} \left(\mathbb I_A \otimes \langle \psi_k| \right) \rho\, O_A |\varphi_j\rangle \otimes |\psi_k\rangle\tag{5}\\
&=\sum\limits_{j\in J}\sum\limits_{k\in K}\langle \varphi_j|\otimes \langle \psi_k| \,\rho\,O \,|\varphi_j\rangle \otimes |\psi_k\rangle \\
&= \mathrm{Tr}\rho\, O \quad .
\end{align}
Here, $\{|\varphi_j\rangle\}_{j\in J}$ denotes an orthonormal basis in $H_A$, $\mathrm{Tr}^{(A)}$ the trace operation on $H_A$ and $O:= O_A\otimes \mathbb I_B$.
For a more rigorous treatment, see for example S. Attal. Tensor products and partial traces. Lecture Notes, especially section $2.3$. or Michael M. Wolf. Mathematical Introduction to
Quantum Information Processing. Lecture notes, especially theorem 1.35. You can find a pdf for the first reference here and for the second here.
Density matrices are, by definition, positive semi-definite trace-class operators of unit trace, see for example this. In the following we will only discuss the finite-dimensional case, so we have to show that the reduced density matrix is positive semi-definite and of unit trace.$^\dagger$
To start, consider a complex bipartite Hilbert space $H=H_A\otimes H_B$ of finite dimension. Let $\rho$ be a density operator on $H$. In particular it holds that
$$
\langle \psi|\rho|\psi\rangle \geq 0 \tag{1}$$
for all $|\psi\rangle \in H$ and
$$\mathrm{Tr} \rho =1 \tag{2} \quad .$$
We define the reduced density operator $\rho_A$ of $\rho$ as
$$ \rho_A:=\mathrm{Tr}_B\,\rho\tag{3} := \sum\limits_{k \in K} \left( \mathbb I_A\otimes\langle \psi_k|\right) \,\rho\, \left(\mathbb I_A\otimes |\psi_k\rangle\right) \quad , $$
for some orthonormal basis $\{|\psi_k\rangle\}_{k\in K}$ in $H_B$, cf. here. Note that $\rho_A$ is a linear operator on $H_A$. We compute its trace:
$$\mathrm{Tr}^{(A)} \rho_A = \sum\limits_{j \in J}\sum\limits_{k \in K} \langle \varphi_j|\otimes \langle \psi_k| \, \rho\, |\varphi_j\rangle \otimes |\psi_k\rangle = \mathrm{Tr} \rho \overset{(2)}{=} 1 \quad , \tag{4} $$
where $\{|\varphi_j\rangle\}_{j\in J}$ and $\{|\varphi_j\rangle \otimes |\psi_k\rangle\}_{j\in J,k\in K}$ are orthonormal bases in $H_A$ and $H$, respectively. We are left to show that $\rho_A$ is positive semi-definite. To do so, define $|\phi_k\rangle := |\varphi\rangle \otimes |\psi_k\rangle$ for an arbitrary $|\varphi\rangle \in H_A$. Then
$$0 \overset{(1)}{\leq}\sum\limits_{k\in K}\langle \phi_k|\rho|\phi_k\rangle = \sum\limits_{k \in K} \langle \varphi| \otimes \langle \psi_k|\,\rho\,|\varphi\rangle \otimes |\psi_k\rangle\overset{(3)}{=}\langle \varphi| \rho_A |\varphi\rangle \tag{5} \quad . $$
In conclusion, since $\rho_A$ is a (linear) positive semi-definite operator of unit trace on $H_A$, it is a density operator.
$^\dagger$ That this generalizes your definition can be seen as follows (again, working with on a finite-dimensional complex Hilbert space):
If an operator $\sigma$ is positive semi-definite, it is hermitian (cf. the last part of this answer) and thus admits a spectral decomposition:
$$\sigma = \sum\limits_i \lambda_i\, |\lambda_i\rangle \langle \lambda_i| \quad , \tag{6}$$
with $\lambda_i \geq 0$. The trace condition now ensures that $ \sum\limits_i \lambda_i = 1$, so it is of the form given by your equation $(1)$ and hence a density matrix. Conversely, for an operator of the form
$$\sigma := \sum\limits_i p_i\, |\psi_i\rangle \langle \psi_i| \tag{7}$$
with $ \sum\limits_i p_i=1$, $p_i \geq 0$ and $|\psi_i\rangle$ normalized (but not necessarily orthogonal), i.e. a density matrix, it is not hard to prove that $\sigma \geq 0$ and $\mathrm{Tr}\, \sigma =1$.
To summarize: Proving that $\rho_A$ is positive semi-definite and of unit trace shows that it admits a representation of the form $(6)$ and thus proves that it is a density operator in terms of your definition.
Best Answer
The reduced density matrix allows to compute all expectation values for the subsystem alone, in other words: For all hermitian operators $O_A$ on $H_A$ we have that
$$ \mathrm{Tr}\, \rho \,O_A \otimes \mathbb I_B= \mathrm{Tr}^{(A)}\rho_A O_A \tag{1} \quad .$$
This, in particular, includes the calculation of probabilities of measurement results of local observables: Write the observable $O_A$ in its spectral representation:
$$O_A=\sum\limits_j o_j\, P_j \tag{2} \quad .$$
The probability to measure $o_j$ is then given by $$\mathrm{Tr}\,\rho\, P_j \otimes \mathbb I_B = \mathrm{Tr}^{(A)}\rho_A P_j \tag{3} \quad . $$ To emphasize: The left-hand side is a postulate, the right-hand side a trivial consequence of $(1)$ (which itself is a consequence of the definition of $\rho_A$). The state of the bipartite system after the measurement is $$ \rho^\prime \propto P_j \otimes \mathbb I_B \, \rho\,P_j \otimes \mathbb I_B \tag{4}$$ with the associated reduced density matrix
$$\rho_A^\prime= \mathrm{Tr}_B \,\rho^\prime \propto P_j\, \rho_A\, P_j \quad . \tag{5} $$ Again: Equation $(4)$ is a postulate, equation $(5)$ is a consequence of it. Indeed, the above follows from the observation that for $|\psi\rangle \in H_B$ it holds that
$$\left(P_j \otimes \mathbb I_B\right) \left(\mathbb I_A\otimes |\psi\rangle\right) =P_j \otimes |\psi\rangle = \left(\mathbb I_A \otimes |\psi\rangle\right)\, P_j \tag{6}$$
and $$ \left(\mathbb I_A\otimes \langle \psi| \right) \left(P_j \otimes \mathbb I_B\right) = P_j \otimes \langle \psi| = P_j\, \left(\mathbb I_A\otimes \langle \psi|\right) \quad . \tag{7} $$
Hence, the knowledge of $\rho_A$ suffices to compute all observable properties of the subsystem corresponding to $H_A$.
Source and further reading: J. Audretsch. Entangled Systems: new directions in quantum physics. Wiley, especially chapter 7. A pdf of the relevant chapter can be found here.