"Chart independent" does not mean what you think it means.
A smooth vector field $v$ on a manifold $M$ is chart independent: treating it as a derivation on smooth functions, $v(f)$ is a scalar independent of which vector field in which you compute it. But the expression of the vector field $v$ in a coordinate basis $v = \sum v^i \partial_i$ is clearly dependent on the choice of charts. In fact, any nontrivial element $k\in T_pM$, when written in a coordinate basis, will depend on the choice of charts. This is simply the fact that different coordinate charts give rise to possibly different coordinate bases of $T_pM$, and the only element $k$ in a vector space whose coordinate representation is the same in all bases is the 0 element.
In other words, "chart independent" should be taken to mean that the object is naturally co/contravariant under changes of bases in (co)tangent spaces, or equivalently changes of charts.
As you can easily verify: if $k\in T_pM$, then $DF(k) \in T_{F(p)}N$ is independent of choice of charts, since the change of representations of $k$ as a vector under the chart $(U,\varphi)$ and under the chart $(U',\varphi')$ precisely cancels the factor $D(\varphi\circ\varphi'^{-1})$ you get.
The usual argument for the fact that given a rank $r$ matrix $A$ we can find invertible $P,Q$ such that
$$PAQ = \begin{pmatrix} I_r & 0 \\ 0 & 0 \end{pmatrix} $$
is obtained by applying row operations (encoded in $P$) and column operations (encoded in $Q^T$) to $A$ until you get to the desired form. If you have a family $A(t)$ of matrices of rank $A(t)$ which depends smoothly on a parameter $t$, it is not clear that this argument can be applied to get smooth families $P(t),Q(t)$ such that
$$P(t)A(t)Q(t) = \begin{pmatrix} I_r & 0 \\ 0 & 0 \end{pmatrix}. $$
You might think that we can find $P(0),Q(0)$ for $A(0)$ and then take $P(t) \equiv P(0), Q(t) \equiv Q(0)$ but by playing with any non-trivial example you will see that this must fail.
How is this related to your situation? After choosing local coordinates, you have a family of matrices $dF|_{(x^1,\dots,x^m)}$ which have constant rank $r$ and depend smoothly on $(x^1,\dots,x^m)$. You suggest that by solving the problem for a specific value of $x^1,\dots,x^m$, you have solved the problem for all the other matrices but this is false. From this perspective, the proof of the constant rank theorem tries to bring the matrices $dF|_{(x^1,\dots,x^m)}$ to a canonical form that works for all values of $x^1,\dots,x^m$ (in a small enough neighborhood).
It is instructive to see how, when $M = \mathbb{R}^m, N = \mathbb{R}^n$ and $F$ is linear, the standard proof of the constant rank theorem reconstructs the familiar result from linear algebra. The point is that the way in which it proves the familiar result from linear algebra is actually more suited to study the general case of bringing a family of matrices (instead of a single one) to a canonical form so from this point of view, the proof is a natural generalization of the linear algebra result.
Best Answer
As I understand your question, you want to know why the definition $F_*X(f) := X(f \circ F)$ is an appropriate generalization of the total derivative. In other words, knowing only the definition of the total derivative, how would one come to this definition of pushforward?
Qiaochu's comment is the key: it comes down to the way directional derivatives relate to derivations. Let's flesh out this idea by recalling some multivariable calculus.
Let $F\colon \mathbb{R}^m \to \mathbb{R}^n$ be smooth, and let $D_pF\colon \mathbb{R}^m \to \mathbb{R}^n$ denote the total derivative at $p \in \mathbb{R}^m$. To each vector $w \in \mathbb{R}^n$ (based at $F(p)$), we associate the derivation at $F(p) \in \mathbb{R}^n$ via: $$w \in \mathbb{R}^n \mapsto w^j \left.\frac{\partial}{\partial x^j}\right|_{F(p)}.$$ In particular, for $v \in \mathbb{R}^m$ (based at $p$), $$D_pF(v) \in \mathbb{R}^n \mapsto D_pF(v)^j \left.\frac{\partial}{\partial x^j}\right|_{F(p)}.$$ And in fact, this derivation on the right-hand side is none other than $$\left.v^i\frac{\partial}{\partial x^i}\right|_p(-\circ F).$$
To see this, we just use the chain rule: $$\begin{align*} v^i \left.\frac{\partial}{\partial x^i}\right|_p(-\circ F) & = v^i \left.\frac{\partial F^j}{\partial x^i}\right|_p \left.\frac{\partial}{\partial x^j}\right|_{F(p)} \\ & = v^i D_pF(e_i)^j \left.\frac{\partial}{\partial x^j}\right|_{F(p)} \\ & = D_pF(v)^j \left.\frac{\partial}{\partial x^j}\right|_{F(p)} \end{align*}$$
Alternatively, I believe it also suffices to note that both derivations give the same value when applied to the coordinate function $x^k$: $$D_pF(v)^j\frac{\partial x^k}{\partial x^j} = D_pF(v)^k = v^i D_pF(e_i)^k = \left.v^i\frac{\partial F^k}{\partial x^i}\right|_p = v^i\left.\frac{\partial}{\partial x^i}\right|_p(x^k \circ F).$$