Following the proof given in Milnor's Topology From A Differentiable Viewpoint:
$\require{AMScd}$
$\begin{CD}
x \in U \subseteq M @>F>> c \in V \subseteq N \\
@V\phi VV @VV\psi V \\
\phi\left(U\right) \subseteq H^{m} @>>\psi F \phi^{-1}> \psi\left(c\right) \in \psi\left(V\right)
\end{CD}$
Because $c \in N$ is a regular value of $F$, for every $x \in F^{-1}\left(c\right) \subseteq M$, there are charts $\left(U,\phi\right)$ at $x$ in $M$ and $\left(V,\psi\right)$ at $c$ in $N$ such that $\psi F \phi^{-1}: \phi\left(U\right) \subseteq H^{m} \to \psi\left(V\right) \subseteq \mathbb{R}^{n}$ is smooth, and has a regular value at $\psi\left(c\right)$.
$\begin{CD}
\phi\left(U\right) \subset W \subseteq \mathbb{R}^{m} @>G>> \psi\left(c\right) \in \mathbb{R}^{n}
\end{CD}$
Let $W$ be an open subset of $\mathbb{R}^{m}$ such that $W \cap H^{m} = \phi\left(U\right)$; and let $G:W\to\mathbb{R}^{n}$ be the smooth extension of $\psi F \phi^{-1}$ over $W$. Now, we can always choose $W$ small enough so that $G^{-1}\left(\psi\left(c\right)\right)$ does not contain any critical points (Sard's lemma; $\mathbb{R}^{m}$ is regular). Thus, $\psi\left(c\right)$ is a regular value of $G$; and by preimage theorem (for smooth manifolds), $Z := G^{-1}\left(\psi\left(c\right)\right)$ is a (m-n)-dimensional submanifold of $\mathbb{R}^{m}$.
Furthermore, since $G$ is constant over $Z$, $T_{a}Z \subseteq ker\left\{DG\left(a\right)\right\}$ for every $a \in Z$. But $DG\left(a\right):\mathbb{R}^{m}\to\mathbb{R}^{n}$ is surjective, and the rank-nullity theorem implies $dim \left(ker\left\{DG\left(a\right)\right\}\right) =$ (m-n). Thus, $T_{a}Z = ker\left\{DG\left(a\right)\right\}$.
Now, define $\pi: Z \subseteq \mathbb{R}^{m} \to \mathbb{R}$ as $\left(x_{1},\ldots,x_{m}\right) \mapsto x_{m}$.
To show that $0 \in \mathbb{R}$ is a regular value of $\pi$:
Suppose otherwise. That is, suppose $\exists$ $a \in Z \cap \partial H^{m}$ such that $d\pi_{a}:T_{a}Z \to \mathbb{R}$ is not surjective. Then, $ker \left\{d\pi_{a}\right\}$ $=$ $T_{a}Z$ $=$ $ker \left\{DG(a)\right\}$. But, we know that $ker \left\{d\pi_{a}\right\} \subseteq \mathbb{R}^{m-1} \times \left\{0\right\} = \partial H^{m}$. Thus, $ker \left\{DG(a)\right\} \subseteq \partial H^{m}$ if $0$ is not a regular value for $\pi$.
Now, since $c \in N$ is a regular value of $F|_{\partial M}$ as well, arguing as before, we can show that $\bar{G} := G|_{W\cap\partial H^{m}}$ has a regular value at $\psi\left(c\right)$. That is, for every $a \in Z\cap\partial H^{m}$, $D\bar{G}\left(a\right): \mathbb{R}^{m-1} \to \mathbb{R}^{n}$ is surjective; and by rank-nullity theorem, $dim \left(ker\left\{D\bar{G}\left(a\right)\right\}\right) = $ (m-n-1)
Finally, $ker \left\{DG(a)\right\} \subseteq \partial H^{m}$ implies $ker \left\{DG(a)\right\} = ker \left\{D\bar{G}(a)\right\}$, which is clearly false (dimension mismatch). Hence, $0$ must be a regular value for $\pi$.
Since $0 \in \mathbb{R}$ is a regular value for $\pi$, $\left\{z \in Z | \pi\left(z\right) \geq 0\right\}$ $=$ $\phi\left(U \cap F^{-1}\left(c\right)\right)$ is a manifold with boundary $\left\{z \in Z | \pi\left(z\right) = 0\right\}$ $=$ $\phi\left(U \cap F^{-1}\left(c\right) \cap \partial M \right)$.
$\phi$ being a diffeomorphism, $U \cap F^{-1}\left(c\right)$ is a manifold with boundary $U \cap F^{-1}\left(c\right) \cap \partial M$. Observing that this is true for every $x \in F^{-}\left(c\right)$ completes the proof.
To the best of my understanding of your question, you are asking the following:
Question 1. Suppose that $A, B$ are smooth manifolds and $f: A\to B$ is an immersion such that $f(A)$ happens to be a smooth submanifold of $B$ (when equipped with the subspace topology, which is a default in this setting). Does it follow that $f$ is an embedding?
The answer to this question is negative. The simplest example is $B=S^1\subset {\mathbb C}$, $A={\mathbb R}$ and $f(t)=e^{it}$.
It is possible, however, that what you have in mind is different:
Question 2. Suppose that $A, B$ are smooth manifolds and $f: A\to B$ is an injective immersion such that $C=f(A)$ happens to be a smooth submanifold of $B$ (when equipped with the subspace topology). Does it follow that $f$ is an embedding?
This question has positive answer, in fact, $f: A\to C$ is a diffeomorphism in this situation (as follows from the inverse mapping theorem).
Lastly, my suggestion is to avoid the terminology "immersed submanifold" when you are just learning Differential Topology. Instead, talk about "immersions"and "embeddings" of smooth manifolds, as well as "submanifolds." Tu is really doing his readers disservice by introducing the terminology "an immersed submanifold" at the early stage. But this is just my opinion.
Edit. It seems that the correct reading of the question is:
Question 3. Suppose that $A, B$ are smooth manifolds, that $f: A\to B$ is an immersion and that $f$'s image $C=f(A)$ (with the subspace topology) is a topological manifold. Is it true that $C$ is a smooth submanifold of $B$?
This one has positive answer too and the proof is similar to one in the case of Question 2.
Step 1. The topological manifold $C$ has the same dimension $a$ as $A$.
Proof. Suppose not. Let $U_j, j\in J$ be the open subsets of $A$ such that $f|U_j$ is an embedding $U_j\to B$ for each $j\in J$, where $J$ is a countable index set. (My definition of manifolds requires that they are 2nd countable.)
In particular, $f|U_j$ is 1-1. Thus, by the invariance of domain theorem, for each $j$, $f(U_j)$ is nowhere dense in $C$. Thus, $C$ is a union of countably many nowhere dense subsets, contradicting Baire's Theorem.
Step 2. $C$ is a smooth $a$-dimensional submanifold of $B$.
Proof. For each $U_j$ as above, again, by the invariance of domain theorem, $f(U_j)$ is open in $C$. But each $f(U_j)$ is a smooth submanifold of $B$.
Hence, for each $x\in f(U_j)$ there is a neighborhood $W_x$ of $x$ in $B$ and a diffeomorphism $h: W_x\to R^b$ ($b$ is the dimension of $B$) sending $W_x\cap U_j$ to an open subset of $R^a\subset R^b$. Thus, $C$ is a smooth submanifold of $B$. qed
Remark: As it turns out, $f$ is a local diffeomorphism onto image, i.e. $f: A \to C$ is a local diffeomorphism.
Best Answer
Assume that $F^{-1}(y)$ is infinite and pick a sequence $x_n \in F^{-1}(y)$. Since $M$ is compact, $x_n$ has a convergent subsequence (which we will rename to $x_n$) so $x_n \rightarrow x$. Since $F^{-1}(y)$ is closed, $F(x) = y$. However, as you wrote, using the inverse function theorem you see $F$ is a local diffeomorphism in a neighborhood of $x$ and in particular one-to-one, so there is a neighborhood of $x$ that doesn't contain any other point of $F^{-1}(y)$ contradicting the fact that $x_n \rightarrow x$.