The existence of flows in the direction of a vector field seems to require Hausdorff; indeed, consider the vector field $\frac{\partial}{\partial x}$ on the line-with-two-origins. We have no global existence of a flow for any positive t, even if we make our space compact (that is, considering the circle-with-one-point-doubled). If the nonexistence of the flow is not visibly clear, consider instead the real line with the interval [0,1] doubled.
Also, partitions of unity do not exist; for example, in the line with two origins, take the open cover by "the line plus the first origin" and "the line plus the second origin". There is no partition of unity subordinate to this cover (the values at each origin would have to be 1).
For me, a basic example of the beauty of this function-theoretic approach is the definition of a vector field as a derivation $D\colon C^\infty(M)\to C^\infty(M)$. The proof that such a derivation defines a vector field hinges upon the fact that $Df$ near a point p only depends on $f$ near the point p. To prove this fact you use the fineness of your sheaf $\mathcal{O}_X$, i.e. the existence of partitions of unity. (It is true though that the failure of fineness in the non-Hausdorff case is of a different sort and might not break this particular theorem.) I feel that the existence of partitions of unity, and the implications thereof, is one of the basic fundamentals of approaching smooth manifolds through their functions; more importantly, a good handle on how partitions of unity are used is important to understand the differences that arise when the same approach is extended to more rigid functions (holomorphic, algebraic, etc.).
Now that the question has been edited to ask specifically about Stokes' theorem, let me say a bit more. Stokes' theorem will be false for non-Hausdorff manifolds, because you can (loosely speaking) quotient out by part of your manifold, and thus part of its homology, without killing all of it.
For the simplest example, consider dimension 1, where Stokes' theorem is the fundamental theorem of calculus. Let $X$ be the forked line, the 1-dimensional (non-Hausdorff) manifold which is the real line with the half-ray $[0,\infty)$ doubled. For nonnegative $x$, denote the two copies of $x$ by $x^\bullet$ and $x_\bullet$, and consider the submanifold $M$ consisting of $[-1,0) \cup [0^\bullet,1^\bullet] \cup [0_\bullet,1_\bullet]$. The boundary of $M$ consists of the three points $[-1]$ (with negative orientation), $[1^\bullet]$ (with positive orientation), and $[1_\bullet]$ (with positive orientation); to see this, just note that every other point is a manifold point.
Consider the real-valued function on $X$ given by "$f(x)=x$" (by which I mean $f(x^\bullet)=f(x_\bullet)=x$). Its differential is the 1-form which we would naturally call $dx$. Now consider $\int_M dx$; it seems clear that this integral is 3, but I don't actually need this. Stokes' theorem would say that
$\int_M dx=\int_M df = \int_{\partial M}f=f(1^\bullet)+f(1_\bullet)-f(-1)=1+1-(-1)=3$.
This is all fine so far, but now consider the function given by $g(x)=x+10$. Since $dg=dx$, we should have
$\int_M dx=\int_M dg=\int_{\partial M}g=g(1^\bullet)+g(1_\bullet)-g(-1)=11+11-9=13$. Contradiction.
It's possible to explain this by the nonexistence of flows (instead of $df$, consider the flux of the flow by $\nabla f$). But also note that Stokes' theorem, i.e. homology theory, is founded on a well-defined boundary operation. However, without the Hausdorff condition, open submanifolds do not have unique boundaries, as for example $[-1,0)$ inside $X$, and so we can't break up our manifolds into smaller pieces. We can pass to the Hausdorff-ization as Andrew suggests by identifying $0^\bullet$ with $0_\bullet$, but now we lose additivity. Recall that $M$ was the disjoint union of $A=[-1,0)$ and $B=[0^\bullet,1^\bullet] \cup [0_\bullet,1_\bullet]$. So in the quotient $\partial [A] = [0]-[-1]$ and $\partial [B] = [1^\bullet]-[0]+[1_\bullet]-[0]=[1^\bullet]+[1_\bullet]-2[0]$, which shows that $\partial [M]\neq \partial [A]+\partial [B]$. This is inconsistent with any sort of Stokes formalism.
Finally, I'd like to point out that Stokes' theorem aside, even rather nice non-Hausdorff manifolds can be significantly more complicated than we might want to deal with. One nice example is the leaf-space of the foliation of the punctured plane by the level sets of the function $f(x,y)=xy$. The leaf-space looks like the union of the lines $y=x$ and $y=-x$, except that the intersection has been blown up to four points, each of which is dense in this subset. In general, any finite graph can be modeled as a non-Hausdorff 1-manifold by blowing up the vertices, and in higher dimensions the situation is even more confusing. So for any introductory explanation, I would strongly recommend requiring Hausdorff until the students have a lot more intuition about manifolds.
Yes, the Whitney theorem can be improved in many cases. For example, C.T.C. Wall proved all 3-manifolds embed in $\mathbb R^5$.
Precisely what is the optimal minimal-dimensional Euclidean space that all $n$-manifolds embed in, I don't know what the answer to that is but Whitney's (strong) embedding theorem is only best-possible for countably-many $n$, not for all $n$. See Haefliger's work on embeddings -- I believe he noticed many cases where you can improve on Whitney.
The suggestion to improve Whitney's theorem that you're giving -- making the target not a Euclidean space but a manifold -- in a sense you're asking for something much weaker than Whitney's theorem. For example, given any $n$-manifold, you can take its Cartesian product with $S^1$. Take the connect sum of all manifolds obtained this way. It's a giant, non-compact $(n+1)$-manifold, and all $n$-manifolds embed in it. This isn't so interesting.
Best Answer
Regarding question 1, yes you can always ensure the image is closed. You prove the strong Whitney by perturbing a generic map $M \to \mathbb R^{2m}$ to an immersion, and then doing a local double-point creation/destruction technique called the Whitney trick. So instead of using any smooth map $M \to \mathbb R^{2m}$, start with a proper map -- one where the pre-image of compact sets is compact. You can then inductively perturb the map on an exhausting collection of compact submanifolds of $M$, making the map into an immersion that is also proper.
Regarding question 2, generally speaking if a manifold is not compact the embedding problem is easier, not harder. Think of how your manifold is built via handle attachments. You can construct the embedding in $\mathbb R^4$ quite directly. Think of $\mathbb R^4$ with its standard height function $x \longmapsto |x|^2$, and assume the Morse function on $M$ is proper and takes values in $\{ x \in \mathbb R : x > 0 \}$. Then I claim you can embed $M$ in $\mathbb R^4$ so that the Morse function is the restriction of the standard Morse function. The idea is every $0$-handle corresponds to creating an split unknot component in the level-sets, etc.
edit: The level sets of the standard morse function on $\mathbb R^4$ consists of spheres of various radius. So when you pass through a critical point (as the radius increases) either you are creating an split unknot component, doing a connect-sum operation between components (or the reverse, or a self-connect-sum), or you are deleting a split unknot component. By a split unknot component, I'm referring to the situation where you have a link in the $3$-sphere. A component is split if there is an embedded 2-sphere that contains only that component, and no other components of the link. So a split unknot component means that component bounds an embedded disc that's disjoint from the other components.
Regarding your last question, the Whitney embedding theorem isn't written up in many places since all the key ideas appear in the proof of the h-cobordism theorem. So Milnor's notes are an archetypal source. But Adachi's Embeddings and Immersions in the Translations of the AMS series is one of the few places where it occurs in its original context. You can find the book on Ranicki's webpage.