There's a reason that definition does not require that the map $\phi$ in a chart $(U,\phi)$ be a diffeomorphism: that would require knowing already that $M$ is a smooth manifold, but since that is what is being defined, the definition would become circular.
However, once a smooth manifold $(M,\mathcal{A})$ is defined, then one can move forward and define smooth functions on open subsets of $M$. Namely, for each open set $W \subset M$, a function $\xi : W \to \mathbb{R}^k$ is smooth if and only if for each chart $(U,\phi)$ in the atlas $\mathcal{A}$ the map $\xi \circ \phi^{-1} : \phi(W \cap U) \to \mathbb{R}^k$ is smooth. And then, by applying the definition of a smooth atlas, it is now an easy lemma to prove that if $(U,\phi)$ is a chart in the atlas $\mathcal{A}$ then $\phi : U \to \mathbb{R}^m$ is indeed smooth.
The usual argument for the fact that given a rank $r$ matrix $A$ we can find invertible $P,Q$ such that
$$PAQ = \begin{pmatrix} I_r & 0 \\ 0 & 0 \end{pmatrix} $$
is obtained by applying row operations (encoded in $P$) and column operations (encoded in $Q^T$) to $A$ until you get to the desired form. If you have a family $A(t)$ of matrices of rank $A(t)$ which depends smoothly on a parameter $t$, it is not clear that this argument can be applied to get smooth families $P(t),Q(t)$ such that
$$P(t)A(t)Q(t) = \begin{pmatrix} I_r & 0 \\ 0 & 0 \end{pmatrix}. $$
You might think that we can find $P(0),Q(0)$ for $A(0)$ and then take $P(t) \equiv P(0), Q(t) \equiv Q(0)$ but by playing with any non-trivial example you will see that this must fail.
How is this related to your situation? After choosing local coordinates, you have a family of matrices $dF|_{(x^1,\dots,x^m)}$ which have constant rank $r$ and depend smoothly on $(x^1,\dots,x^m)$. You suggest that by solving the problem for a specific value of $x^1,\dots,x^m$, you have solved the problem for all the other matrices but this is false. From this perspective, the proof of the constant rank theorem tries to bring the matrices $dF|_{(x^1,\dots,x^m)}$ to a canonical form that works for all values of $x^1,\dots,x^m$ (in a small enough neighborhood).
It is instructive to see how, when $M = \mathbb{R}^m, N = \mathbb{R}^n$ and $F$ is linear, the standard proof of the constant rank theorem reconstructs the familiar result from linear algebra. The point is that the way in which it proves the familiar result from linear algebra is actually more suited to study the general case of bringing a family of matrices (instead of a single one) to a canonical form so from this point of view, the proof is a natural generalization of the linear algebra result.
Best Answer
Assume $\text{dim}\ M=\text{dim}\ N=n.$ In what follows, we use the Einstein convention for all sums.
The point is that if $(U_p, \phi_p)$ is a chart about $p$ in $M$ and $(V_{f(p)}, \psi_{f(p)})$ is a chart about $f(p)$ in $N$ then $(f_*)_p: T_pM \to T_{f(p)}N$ is a linear transformation, so it has a matrix representation in the coordinates defined by $\phi$ and $\psi$. If we can show that this matrix is the Jacobian of $\hat f:=\psi_{f(p)} \circ f \circ \phi_p^{-1}$, which is $\left(\frac{\partial \hat f^j}{\partial r^i}\right)_{ij},$ then $\hat f$ will be a local diffeomorphism, which is what we want.
But, $\textit{by definition},\ \frac{\partial }{\partial x^i}=(\phi_*)^{-1}\frac{\partial }{\partial r^i}$, where $(r^i)$ are the usual Euclidean coordinates. Similarly, $\frac{\partial }{\partial y^i}=(\psi_*)^{-1}\frac{\partial }{\partial s^i}$ where we use $(s^i)$ to represent the Euclidean coordinates in the range of $\hat f$ just to make the calculations easier to follow. For the same reason, we drop the subscripts $p$ and $f(p).$ Finally, we note that by the chain rule in $\mathbb R^n,\ \hat f_*\frac{\partial }{\partial r^i}=\frac{\partial \hat f^j}{\partial r^i}\frac{\partial}{\partial s^j},$ where the $(\hat f^j)$ are the components of $\hat f$. Then, we calculate
$f_*\frac{\partial }{\partial x^i}=f_*\circ (\phi_*)^{-1}\frac{\partial }{\partial r^i}=(f\circ \phi^{-1})_*\frac{\partial }{\partial r^i}=$
$(\psi^{-1}\circ \hat f)_*\frac{\partial }{\partial r^i}=(\psi_*)^{-1}\circ \hat f_*\frac{\partial }{\partial r^i}=$
$(\psi^{-1})_*\frac{\partial \hat f^j}{\partial r^i}\frac{\partial}{\partial s^j}=\frac{\partial \hat f^j}{\partial r^i}(\psi^{-1})_*\frac{\partial}{\partial s^j}=\frac{\partial \hat f^j}{\partial r^i}\frac{\partial}{\partial y^j}$.
It follows that the matrix of $f_*$ is the Jacobian of $\hat f,$ as desired.