$\newcommand{\M}{M}$
$\newcommand{\N}{N}$
$\newcommand{\brk}[1]{\left(#1\right)}$
$\newcommand{\be}{\beta}$
$\newcommand{\al}{\alpha}$
$\newcommand{\til}{\tilde}$
The pullback bundle is indeed an embedded submanifold of the product $M \times E$. The essential notion here is transversality. (Together with the fact that bundle projection is a submersion).
The full story with all the details is a bit long; I haven't seen it done in any textbook, I found all the steps here and there, and built my own picture of the things.
Definitions
$(1)$ Let $\M,\N$ be a smooth manifolds. Suppose $F:\N \to \M$ is a smooth map, $S \subseteq \M$ is an embedded submanifold. We say $F$ is transverse to $S$ if $\forall x \in F^{-1}\brk{S} \, , \, T_{F\brk{x}}\M= T_{F(x)}S + dF_x(T_x\N)$.
$(2)$ Let $\M,\N,\N'$ be smooth manifolds. Suppose $F:\N \to \M \, , \, F':\N' \to \M$ are smooth maps. We say $F,F'$ are transverse to each other if $\forall x \in \N, x' \in \N' $ such that $F(x)=F'(x')$ , $T_{F(x)}\M=dF_x(T_x\N) + dF'_{x'}(T_{x'}\N')$.
Note: If either one of $F,F'$ is a submersion, then they are automatically transverse.
Note: Some of ther proofs are at the end of the answer (So it will be possible to skim through the general scheme without all the details at first)
Lemma (1):
$\M,\N$ be a smooth manifolds, $S \subseteq M$ is an embedded submanifold. Let $F:\N \to \M$ be transverse to $S$. Then $F^{-1}(S)$ is an embedded submanifold of $\N$ whose codimension is equal to the codimension of $S$ in $\M$.
proof:
See Theorem 6.30 , in Lee. (pg 144).
Lemma (2): (This is exercise 13 in chapter 6, Lee)
Let $\M,\N,\N'$ be smooth manifolds. Suppose $F:\N \to \M \, , \, F':\N' \to \M$ are smooth maps. Then $F,F'$ are transverse to each other if and only if the map $F \times F' : \N \times \N' \to \M \times \M$ is transverse to the diagonal $\Delta_\M = \{(x,x)|x \in \M \}$
Lemma (3):
Let $\M$ be a smooth manifold. Then $\Delta_\M = \{(x,x)|x \in \M \}$ is an embedded (smooth) submanifold of $\M \times \M$.
Lemma (4):
Let $\M$ be a manifold. Let $\Delta_\M$ be the diagonal manifold of $\M$. (see Lemma 3 ). Then $T_{\brk{x,x}}\Delta_\M = \text{diag}\brk{T_x\M \times T_x\M}=\{(v,v)| v \in T_xM \}$. (i.e the tangent space of the diagonal is the diagonal of the tangent space).
proof:
Since any tangent vector can be realized as a derivative of a path, the tangent space to a manifold is identical to the set of derivatives of paths. Since $\Delta_\M$ is an embedded submanifold, a path $\be:I \to \Delta_\M $ is smooth if and only if it is smooth when considered as a path into the product $\M \times \M$ if and only if each of its components is smooth. So $\be(t)=\brk{\al\brk{t},\al\brk{t}}$ , where $\al : I \to \M$, so $\dot \be (0) \overset{(*)}= \brk{\dot \al (0),\dot \al (0)}$ , hence its clear the tangent space to the diagonal is exactly the diagonal of the tangent space. (Where in (*) we used the canonical isomorphism between $T_{(x,x')}\brk{\M \times \M'} = T_x\M \oplus T_{x'}\M'$ via the differentials of the projections onto the different components).
corollary (1):
Let $\M,\N,\N'$ be smooth manifolds, $F:\N \to \M \, , \, F':\N' \to \M$ are smooth maps. (In short we write $\N\overset{F}{\rightarrow}\M\overset{F'}{\leftarrow}\N'$). Assume $F,F'$ are transverse to each other. Then the fiber product of this diagram, which is defined as $\{\brk{x,x'} \in \brk{\N, \N'}|F(x)=F'(x') \}$ is an embedded smooth submanifold of the product $\N \times \N'$.
proof of corollary (1):
The fibered product $\N \times_{\M} \N'$ is the inverse image $(F \times F')^{-1}\brk{\Delta_{\M}}$. By Lemma 3, $\Delta_\M$ is a submanifold of $\M \times \M$. Now combine Lemma 2 and Lemma 1.
corollary (2):
Let $\M,\N,\N'$ be smooth manifolds, $\N\overset{F}{\rightarrow}\M\overset{F'}{\leftarrow}\N'$. If either one of $F,F'$ is a submersion, then the fiber product $\N \times_\M \N' = \{\brk{x,x'} \in \brk{\N, \N}|F(x)=F'(x') \}$ is an embedded submanifold of the product $\N \times \N'$.
proof of corollary (2):
If one of $F,F'$ is a submersion, then these two maps are automatically transverse to each other. Now use corollary (1).
In particular we get the following proposition:
Let $\pi: E \to B$ be a vector bundle, $f:B' \to B$. The pullack bundle $f^*\brk{E}$ is an embedded submanifold of the product $B' \times E$.
(This is becuse the bundle projection $\pi$ is always a submersion).
proof of Lemma (2):
First, we need a sublemma:
Sub-lemma:
Let $V$ be a vector space, $V_1,V_2 \subseteq V$ are subspaces. Let $\text{diag}(V \times V) = \{(v,v)|v \in V \} $. Then
$V \oplus V = \text{diag}(V \times V) + \brk{V_1 \oplus V_2} \iff V = V_1 + V_2$
proof of the sublemma:
$\Rightarrow :$ Let $v \in V$. Then $(v,0) \in V \oplus V$, hence by our assumption $\exists \til v \in V, v_1 \in V_1 , v_2 \in V_2$ such that $(v,0) = (\til v ,\til v) + (v_1,v_2)=(\til v + v_1, \til v + v_2) \Rightarrow \til v = -v_2, v = \til v + v_1 = v_1 - v_2 \in V_1 + V_2 $ .
$\Leftarrow :$ Note that both sides of the left equation are subspaces. Hence, from symmetry it's enough to show that $\forall v \in V \, , \, (v,0) \in \text{diag}(V \times V) + \brk{V_1 \oplus V_2}$. The assumption $V =V_1 + V_2 \Rightarrow \exists v_i \in V_i$ such that $v = v_1 -v_2$. Define $\til v = -v_2$, so we get $(v,0)=(v_1-v_2,\til v +v_2)=(v_1 + \til v, v_2 + \til v) = (\til v,\til v) +(v_1,v_2)$.
Now to the actual proof of Lemma (2):
By definition (1), $F \times F'$ is transverse to the diagonal if
\begin{split}
&\forall (x,x') \in (F \times F')^{-1}\brk{\Delta_\M} \, , \, T_{(F \times F')\brk{x,x'}}\brk{\M \times \M}= T_{(F \times F')(x,x')}\Delta_\M + d(F \times F')_{(x,x')}(T_{(x,x')}\brk{\N \times \N'}) \iff \\
& T_{\brk{F(x),F'(x')}}\brk{\M \times \M}= T_{\brk{(F(x),F'(x')}}\Delta_\M + d(F \times F')_{(x,x')}(T_x\N \oplus T_x\N') \iff \\
& T_{\brk{F(x),F(x)}}\brk{\M \times \M}= T_{\brk{(F(x),F(x)}}\Delta_\M + d(F \times F')_{(x,x')}(T_x\N \oplus T_x\N') \iff \\
&T_{\brk{F\brk{x}}}\M \oplus T_{\brk{F\brk{x}}}\M \overset{Lemma 4}= \text{diag}\brk{T_{F(x)} \M \times T_{F(x)}\M} + \brk{ dF_x \brk{T_x \N} \oplus dF'_{x'} \brk{T_{x'}\N'}} \overset{Sub-lemma} \iff \\
&T_{F(x)}\M =dF_x \brk{T_x \N} + dF'_{x'} \brk{T_{x'}\N'}
\end{split}
Since the last row is the defintion transverse maps, we finished.
proof of Lemma (3):
The diagonal is the graph of the smooth function $Id_\M$, and graphs of smooth functions are always embedded submanifolds of the product of the domain and the codomain. (See prop 5.4, Lee).
I haven't carefully read the proof in your post; I am more interested in whether this subset is a topological manifold, rather than a smooth one such that the quotient map is a submersion, which as you say is not true, and straightforward to see; the differential must necessarily kill the tangent space of $S$, but the quotient "manifold" is still of the same dimension as $M$, because $S$ has positive codimension.
And yes, this is indeed possible. Take $\Bbb{CP}^1 \subset \Bbb{CP}^2$. Collapsing this to a point gives you a copy of $S^4$. (To see this straightforwardly, note that $\Bbb{CP}^2$ is obtained by gluing a $D^4$ to $S^2$ by the Hopf map on the boundary; collapsing the $S^2$ gives us a space obtained by collapsing the boundary of $D^4$ to a point: $S^4$.)
Here is how I came up with the example, and the general principle at play here. The tubular neighborhood theorem says there is a neighborhood of a submanifold $S$ diffeomorphic to a vector bundle (the normal bundle) over $S$. So it suffices to answer it for this case. Now if $E$ is a bundle, and $S$ the zero section, $E/S$ is contractible, so has trivial homology. This implies that $H_{*-1}(E/S - [S]) \cong H_*(E/S,E/S-[S])$ by the relative long exact sequence. The first group is isomorphic to $H_*(E-S)$ (they're homeomorphic!), which deformation retracts onto the unit sphere bundle of the vector bundle. Now we can conclude: If $E/S$ was a manifold, then the local homology at $[S]$ $H_*(E/S,E/S-[S])$ would have the same homology of the appropriate-dimensional sphere. This is false for many, many vector bundles (for instance, if you collapse a circle in a manifold, the result is never a manifold) but not always false; the way the above example came to mind is that the circle bundle over $S^2$ with Euler class 1 is $S^3$, and hence of course does have the same homology as $S^3$...
Now here's a full proof that the quotient is a manifold if and only if the sphere bundle is a sphere. (It's quite clean, cleaner than the above, but I thought I would leave it in to show my thought process.) The quotient $E/S$ is homeomorphic to the cone on the sphere bundle. The above argument shows that whenever the sphere bundle has the wrong homology, the quotient is not a manifold. When it has the right homology, but is not simply connected, follow the argument here. Now any closed simply connected manifold with the same homology as a sphere is a sphere, by the Poincare conjecture.
OK, when is a sphere bundle a sphere? Suppose $S^n$ fits into a fibration with fiber $S^k, k>0$, and base $M$. Then $M$ is necessarily simply connected, and a straightforward spectral sequence argument shows that necessarily either the cohomology of the base is $\Bbb Z[x]/(x^n)$ where $|x|$ is even. This is only possible if $|x|$ is 2 or 4 (or $|x|=8$ and $n \leq 3$); see proposition 4L.10 of Hatcher. So necessarily the base has the same cohomology ring as $\Bbb{CP}^n$ or $\Bbb{HP}^n$ and is simply connected (or the base is $S^8 = \Bbb{OP}^1$ or has the cohomology ring of $\Bbb{OP}^2$) In the case $k=0$ the same holds with $\Bbb{RP}^n$ and $\Bbb Z/2$-coefficients. Unfortunately, this is as far as we go: it is entirely possible that this is not the obvious fibration. For instance, there are fake projective spaces, both real and complex, all of which (by the argument given on the latter page) support sphere bundles with total space $S^n$. I'm sure there's also a classification of fake quaternionic projective spaces. This is probably a complete classification of sphere bundles with appropriate total space: all are $S^0$, $S^1$, or $S^3$ bundles over a fake projective space (or again the case of $\Bbb{OP}^2$). I am not going to carry out the details that this is the case. (EDIT: Yes, this is the classification; simply connected manifolds with this cohomology ring are homotopy equivalent to the appropriate projective space, because they have a CW decomposition with a cell in each degree $kn$, $k=2,4,8$; then these are homotopy equivalent to the projective spaces by an inductive argument.)
Best Answer
I've been strugling with this exercise for a while too. Here is my solution. I agree with your solution to show that $\pi : E \rightarrow M$ is a submersion. To show that $E_p = \pi^{-1}(p)$ is regular submanifold diffeomorphic to $F$, we can use Theorem 3.5 in Jeffrey Lee's book. That is we just need to show that there is a smooth immersion homeomorphic to the fibre $E_p$.
$\textbf{Proof that $E_p$ is a regular submanifold diffeomorphic to $F$}:$
Consider the diffeomorphism $\phi : \pi^{-1}(U) \rightarrow U \times F$. Note that $\{p\} \times F$ is a regular submanifold of $U\times F$. Because of this, the restriction of the smooth map $\phi^{-1}$ to $\{p\}\times F$ is smooth. That is we have a smooth map $$ \phi^{-1}|_{\{p\}\times F} : \{p\}\times F \rightarrow \pi^{-1}(U) $$ where the image of the domain is $\pi^{-1}(p)=E_p$. Also because $\phi^{-1}$ is a diffeomorphism, then $d\phi^{-1}$ is an isomorphism for any point in $U\times F$. Therefore the differential of the map $\phi^{-1}|_{\{p\}\times F}$ is injective at each point. So the map is an immersion. Therefore $E_p$ is a regular submanifold diffeomorphic to $F$.
(Note :The diffeomorphic part of this conclusion does not explicitly state in the theorem 3.5 but we can prove that. I prefer the same theorem in other book such as John Lee's smooth manifold Proposition 5.2 which is include this.)
$\textbf{Proof that if $F$ and $M$ is connected, then so is $E$}$ :
To proof this i need this following theorem from topology (e.g from Willard's book) : If a topological space $X$ is connected and $\mathscr{U}$ is an open cover for $X$, then any two points can be connected by a simple chain consisting of elements of $\mathscr{U}$.
By local trivialization for each $p \in M$ we have an open subset $U \subset M$ containing $p$ and a diffeomorphism $\phi : \pi^{-1}(U) \rightarrow U \times F$. Let $\{\pi^{-1}(U)\}$ be the open cover for $E$. By above theorem we can have a simple chain connecting any two points $v,w \in E$ if all the elements of the open cover $\{\pi^{-1}(U)\}$ connected. Because $F$ and $M$ connected, then $U\subset M$ connected, $U \times F$ connected, $\phi^{-1}(U\times F) = \pi^{-1}(U)$ connected. So we have a simple chain where each of its elements is connected (implies path-connectedness). By this we can easily make a path connecting $v,w \in E$ by joining the paths from each chain.
I think many ways to prove this but this is the one that i find it quite convincing. Let me know if you have another solution or found error in my proof.
$\textbf{EDIT}$:
To see more clear that $E_p$ and $F$ is diffeomorphic, we can just restrict the map $\phi : \pi^{-1}(U) \rightarrow U \times F$ to the domain and codomain (which is both regular submanifold) $E_p$ and $\{p\} \times F$ respectively to obtain the map $$ \phi|_{E_p}: E_p \rightarrow \{p\} \times F $$ Because the restriction of a smooth map to domain (or codomain) which is a regular submanifold is smooth, then the map above is a diffeomorphism. However, the alternative for the second proof (about connectedness of $E$) you can look at here Show that the total space $E$ of a fibre bundle $\pi : E \rightarrow M$ is connected.
Remark about restriction of smooth map to regular submanifold :
$\bullet$ Restriction to Domain
Let $F : M \rightarrow N$ be a smooth map and $S \subset M$ is a regular submanifold. Let $\iota : S \hookrightarrow M$ is the inclusion map. Then $F|_S = F \circ \iota : S \rightarrow N$ is smooth.
I know this result first from John Lee's book smooth manifold (p.112), but i didnt found it (explicitly) in Jeff Lee's book (it doesnt mean that its not there, because i'm not reading Jeff Lee's book thorough), maybe its because this result can easily proved. Here is the proof from John Lee's :
Here the argument "$S\subset M \quad \text{reg. submanifold} \implies \iota : S \hookrightarrow M \quad \text{smooth}$ " follows from definition because John Lee define embedded submanifold (regular submanifold) very carefully from the beginning (look its definition on page 98). If you want to refer to Jeff Lee's book, he is mention it in p.132 (paragraph above Corollary 1.35). It states as
But i dont think its really that easy for beginner to see. So i think the better way to study submanifold carefully is by John Lee's book. However i just found its direct proof in L.Tu's book Theorem 11.14
$\bullet$ Restriction to Codomain
The result for this available in John Lee's (p.113 Corollary 5.30) and Jeff Lee's books (p.132 Corollary 3.15). However, as usual, i prefer John Lee's. Here is what he says
By combining these two results, we can safely says that $$ \phi|_{E_p}: E_p \rightarrow \{p\} \times F $$ is a diffeomorphism.
[Look Walter Poor's Differential Geometric Structure for the same proof (basically) except he use the Regular Level Set Theorem to show that $E_p = \pi^{-1}(p)$ is regular submanifold of $E$.]