I'm going to take their definition and go backwards. Let's start with the definition of a manifold. It sounds like to you, a manifold means a subset $M$ of $\Bbb R^n$ such that for all $p \in M$, there is a diffeomorphism $\psi: \Bbb R^n \to U$, $U$ an open subset of $\Bbb R^n$ that contains $p$, such that $M \cap U = \psi(\Bbb R^k)$, where $\Bbb R^k$ is the subspace of $\Bbb R^n$ where only the first $k$ coordinates can be nonzero.
Now, let's define an embedding of manifolds. What they say is almost fine - provided they add the phrase "such that $f(M)$ is a manifold". That is, I think their definition should say "An embedding $f: M \to N$ is a smooth map such that $f(M)$ is a manifold and such that the map $f$ is, then a diffeomorphism onto its image." If we don't demand that $f(M)$ is a manifold, this just doesn't make sense. (For an example of where $f(M)$ is not a manifold, take eg the figure 8: I can make this the image of a smooth immersion from the circle. Also see, for instance, the graph of |x|, which I can make the image of an injective smooth map from $\Bbb R$ - just not an immersion.)
Note, in addition, that this contains the demand that it be a topological embedding - a homeomorphism onto its image. An injective immersion is not good enough unless the map is also proper: take the figure 8 above, and then write it as the injective image of $\Bbb R$. (The two 'tails' of $\Bbb R$ approach the center point from the bottom left and top right.) Then this is obviously not a homeomorphism onto its image. But injective proper maps are automatically topological embeddings in this context. (One case where you don't want to consider proper maps: open subsets of $\Bbb R^n$, like $GL_n(\Bbb R) \subset \text{Mat}(n \times n)$!)
Given this, let's prove the claim. First, it had better be injective, because it's a topological embedding. Why is it an immersion? Let's go back to the definition of diffeomorphism: a diffeomorphism is, in particular, an immersion. So if it's a diffeomorphism onto its image, then the map $M \to f(M)$ is an immersion; and by the fact that $f(M)$ is a manifold sitting inside $N$, the map $f(M) \to N$ is an immersion. (If you want to be careful about the proof of this, think about the charts we were guaranteed in the first paragraph in the definition of manifold, and think about how they automatically imply that the inclusion map $f(M) \to \Bbb R^K$ is an immersion, where $\Bbb R^K$ is whatever Euclidean space $N$ sits inside.)
Let's call $\phi$ the parametrization of the 2-torus in $\mathbb{R}^4$, such that: $\phi: \mathbb{R}^2 \to T^2 =(\sin(x),\cos(x),\sin(y),\cos(y))$. Now, let $\pi$ be a line in $\mathbb{R}^2$ with irrational angolar coefficient, let's set it at $\sqrt{2}$. Let $\varphi:=\phi_{\vert \pi}$. Thus, the mapping is differentiable and it is, in fact an immersion, but is not an embedding, since the image is one-dimensional (yet, these sub-manifold is dense in the torus). Actually, in both these example and in yours, the patological behaviour is in some dense caused by the fact that those mappings are not "proper", they map too many points near infinity near to others. Formalizing this definition, we get the embedding' definition.
It is necessary for it to be an immersion since, if it is not, thus the jacobian matrix won't have maximum rank somewhere, and thus the transformation won't be invertible
Best Answer
This is slightly subtle. It is not a homeomorphism onto its image in the subspace topology. Indeed, given a topological space $X$ and a subset $Y\subseteq X$, the subspace topology of $Y$ inherited from that on $X$ says that the open subsets of $Y$ are those of the form $U\cap Y$ for $U$ open in the topology on $X$.
Let $f:[0,1)\to \mathbb{R}^2$ denote the figure-six map. In this case, you should think about what the open neighborhoods of the point at the "join" between the loop of the $6$ and the stem of the $6$ (the point of almost intersection) are in the subspace topology. Now, can you find a reason why such an $f$ cannot be a homeomorphism?
As for your question about the immersion: you should think more carefully about the differential. If you parametrize the map appropriately, the differential will be zero at no point in the domain. Indeed, you can think of having a particle traversing the figure $6$ in time $t\in [0,1)$ without stopping.