First, we use two charts, $(U,h)$ with $h(p) = 0 \in \mathbb{R}^m$, and $(V,g_1)$ with $g_1(\phi(p)) = 0 \in \mathbb{R}^n$ to transport the problem to Euclidean space. Shrinking $h$ if necessary to have $\phi(U) \subset V$, we obtain a smooth $\psi = g_1 \circ \phi \circ h^{-1} \colon \underbrace{h(U)}_{W} \to \mathbb{R}^n$ with $\psi(0) = 0$, and the rank of $J(\psi)(0)$ being $m$.
From this point on, we only consider local diffeomorphisms of $\mathbb{R}^n$ at $0$ - and possibly shrink $W$ if necessary - to achieve the desired representation.
If $n > m$, it need not be the case that the first $m$ rows of $J(\psi)(0)$ are linearly independent, so we need a permutation of the coordinates of $\mathbb{R}^n$ to achieve that. For $\pi \in S_n$, the map
$$P_\pi \colon (x_1,\dotsc,x_n) \mapsto (x_{\pi(1)},\dotsc,x_{\pi(n)})$$
is a diffeomorphism of $\mathbb{R}^n$ (with inverse $P_{\pi^{-1}}$), and for some $P = P_\pi$, the composition $\chi = P\circ \psi$ has the first $m$ rows of $J(\chi)(0) = J(P)(0)\cdot J(\psi)(0)$ linearly independent.
Now we are - except for notation - in the "consider the case" situation. We consider the map
$$F \colon W\times \mathbb{R}^{n-m} \to \mathbb{R}^n,\quad F(x_1,\dotsc,x_n) =
\begin{pmatrix}\chi_1(x_1,\dotsc,x_m)\\ \vdots\\ \chi_m(x_1,\dotsc,x_m)\\x_{m+1}\\ \vdots\\ x_n\end{pmatrix}.$$
$F$ is smooth, and its Jacobi matrix at $0$ is
$$J(F)(0) = \begin{bmatrix} J(\chi^{(m)})(0) & 0 \\ 0 & I\end{bmatrix},$$
where $\chi^{(m)}$ denotes the first $m$ components of $\chi$. By construction, $J(\chi^{(m)})(0)$ is invertible, hence $J(F)(0)$ is invertible, and $F$ is a local diffeomorphism of $\mathbb{R}^n$ at $0$ by the inverse function theorem, say $F\lvert_{V_1}\colon V_1 \to V_2$ is a diffeomorphism with $V_1,V_2$ open neighbourhoods of $0$ in $\mathbb{R}^n$.
We shrink $W$ if necessary to have $\chi(W) \subset V_2$, and consider $\eta = (F\lvert_{V_1})^{-1} \circ \chi$. We have
$$\begin{pmatrix} \chi_1(x') \\ \vdots \\ \chi_m(x') \\ \chi_{m+1}(x')\\ \vdots \\ \chi_n(x')\end{pmatrix} = \chi(x') = (F\circ \eta)(x') = \begin{pmatrix} \chi_1(\eta^{(m)}(x'))\\ \vdots\\ \chi_m(\eta^{(m)}(x'))\\\eta_{m+1}(x')\\\vdots\\ \eta_n(x')\end{pmatrix},$$ where $\eta^{(m)}$ denotes the first $m$ components of $\eta$ and $x' \in W$. Since $\chi^{(m)}$ is a local diffeomorphism of $\mathbb{R}^m$ at $0$, it follows that $\eta^{(m)}(x') = x'$ in some neighbourhood $W'\subset \mathbb{R}^m$ of $0$. Shrink $W$ if necessary to assume $W = W'$. So we have
$$\eta(x_1,\dotsc,x_m) = (x_1,\dotsc, x_m, \eta_{m+1}(x_1,\dotsc,x_m),\dotsc, \eta_n(x_1,\dotsc,x_m)),$$
or, by slight abuse of notation, $\eta(x') = (x',\tilde{\eta}(x'))$.
Now consider the map
$$G\colon W\times \mathbb{R}^{n-m},\quad (x',x'') \mapsto (x', x'' - \tilde{\eta}(x')).$$
We have
$$J(G)(0) = \begin{bmatrix} I & 0 \\ J(\tilde{\eta})(0) & I\end{bmatrix},$$
which is evidently invertible, so the restriction of $G$ is a diffeomorphism between two open neighbourhoods $V_3$ and $V_4$ of $0$ in $\mathbb{R}^n$. If necessary, shrink $V_1$ and accordingly $V_2$, and consequently $W$ so that $V_1 \subset V_3$.
Then $\iota = G\circ \eta \colon x' \mapsto G(x',\tilde{\eta}(x')) = (x', \tilde{\eta}(x') - \tilde{\eta}(x')) = (x',0)$ is the representation we wanted, and unraveling the construction,
$$\iota = G\circ \eta = G\circ (F\lvert_{V_1})^{-1} \circ \chi = G\circ (F\lvert_{V_1})^{-1} \circ P \circ \psi = G\circ (F\lvert_{V_1})^{-1} \circ P \circ g_1 \circ \phi \circ h^{-1},$$
we see that the chart
$$g = G\circ (F\lvert_{V_1})^{-1} \circ P \circ g_1$$
gives the desired representation.
In particular, we see that for every chart $h$ around $p$ with $h(p) = 0$, there exists a chart $g$ around $\phi(p)$ such that $g\circ \phi \circ h^{-1}$ has the desired form on a neighbourhood of $0$ in $\mathbb{R}^m$.
Best Answer
Shouldn't the rank be constant on a neighborhood of that point to make this conclusion? It is sufficient to check at that point if the rank is maximal, but otherwise the rank can "jump" up suddenly, meaning you would need more coordinates. It can't jump down suddenly, because the determinant function is smooth. It takes some time for a nonzero continuous function to become zero, but a function that is zero can "instantly" become non-zero.
Otherwise your intuition seems more or less correct. Sometimes a useful term is locally: "The constant rank theorem says that a smooth function with locally constant rank is locally a linear map of that same rank, up to diffeomorphism."