As @InTransit suggested in a comment, O'Neill's phrase "and defined on an open subset of $\mathbb R^m$" seems to take care of the problem. Note that the subset of $\mathbb R^m$ on which $\psi\circ f \circ \varphi^{-1}$ is defined is
$(f\circ\varphi^{-1})^{-1}(V) = \varphi(f^{-1}(V)\cap U)$. O'Neill's definition stipulates that this set is open in $\mathbb R^m$. Because $\varphi$ is, in particular, a homeomorphism from an open subset of $M$ to an open subset of $\mathbb R^m$, this implies that $f^{-1}(V)\cap U$ is open in $M$. Knowing this, we can replace $U$ by $U_0 = f^{-1}(V)\cap U$ and $\varphi$ by $\varphi|_{U_0}$, and then smoothness by O'Neill's definition implies smoothness by mine.
If you omit the requirement that the composite map be defined on an open set, but only require, as the OP suggested, that $\psi\circ f \circ \varphi^{-1}$ have a smooth extension to an open neighborhood of each point, then you won't necessarily get continuity. A counterexample (taken from Problem 2-1 in my smooth manifolds book, 2nd ed.) is the function $f\colon\mathbb R\to \mathbb R$ defined by
$$
f(x) = \begin{cases}
1, &x\ge 0,\\
0, &x< 0.
\end{cases}
$$
I am not absolutely sure but I believe the point of do Carmo's (2) is to "import" the topology from $\mathbb{R}^n$. If the preimage of $W$ is open, then the map is continuous if $W$ is open. Defining this $W$ to be open basically gives a topology on $M$.
The fact that do Carmo's coordinate maps go the opposite way is irrelevant, since they are bijective. This can be seen, because they are defined to be injective, so if their codomain is restricted to the ranges, then they are also bijective, and since their domain is open, and we defined their range to be open, so they are continuous both ways, ergo they are homeomorphisms. And for a homeomorphism the "initial direction" of the mapping is irrelevant.
To help you see that these are the same, I will give you a complete definition of smooth manifolds as it is usually done (and also as it is done in Lee's book):
We first define topological manifolds.
Definition:
A real, topological manifold of dimension $n$ is a set $M$ for which it is true that
a) $M$is equipped with a topology, $\tau$;
b) $\tau$ is Hausdorff and second countable;
c) $M$ is locally euclidean.
By c) I mean, that for any $p\in M$ point there is an open set $U\in\tau$ that contains $p$ and there exists a homeomorphism $\varphi:U\rightarrow\mathbb{R}^n$.
Obviously, since $p$ is arbitrary, this means that there must be enough of these $U$s that they cover $M$. The $(U,\varphi)$ pair is called a chart. A set of charts $\{(U_\alpha,\varphi_\alpha)\}_{\alpha\in\mathbb{A}}$ that cover $M$, eg. $\cup_\alpha U_\alpha=M$ is called an atlas.
We say that an atlas is $C^k$, if for any two $U_\alpha$, $U_\beta$ that have nonzero intersections, the map $\varphi_\beta\circ\varphi^{-1}_\alpha:\mathbb{R}^n\rightarrow\mathbb{R}^n$ is $C^k$.
We call two atlases, $\{(U_\alpha,\varphi_\alpha)\}$ and $\{(V_\beta,\psi_\beta)\}$ $C^k$-compatible, if their unification is a $C^k$ atlas.
An atlas $\mathcal{A}$ is maximal if it contains all possible $C^k$-compatible atlases. The definition of a maximal atlas is needed so that two manifolds with different atlases, but which are $C^k$-compatible will not be considered different manifolds. A maximal $C^k$ atlas is what we call a $C^k$ differentiable structure.
Then we define $M$ to be a real, $n$-dimensional $C^k$ manifold if $M$ is a real, $n$-dimensional topological manifold, with a maximal $C^k$ atlas on it.
As you can see, this definition is the same as wikipedia's, but has been broadened and clarified. It is not hard to see that do Carmo's definition is also the same, his (1) is the requirement that charts cover the manifold, his (2) defines topology, atlases, and their differentiable property, and his (3) extends the atlas maximally.
Best Answer
The definition in Milnor and Stasheff is a bit of a hybrid between a purely "coordinate chart" definition you alluded to (requiring transition functions to be smooth), and a purely Euclidean space definition, which runs as follows:
An $n$-dimensional manifold is a subset of $\mathbb{R}^A$ (here $A$ may be much bigger than $n$) such that each point has a neighborhood which is the graph of a differentiable function over a suitable coordinate subspace $\mathbb{R}^n\subset\mathbb{R}^A$.
Note that in this definition we only need the standard coordinate planes (it is not necessary to take arbitrary subspaces), by the implicit function theorem.
For example, the circle in the plane is the graph of a function of type $\sqrt{1-t^2}$ near every point, either over the $x$-axis or over the $y$-axis.
The "coordinate chart" definition has the advantage that no apriori structure is assumed on $M$ (other than being a set). In particular, the topology results from the smooth structure imposed by the transition functions.
From this point of view, the Milnor-Stasheff definion has a disadvantage that we must already know about topological spaces and the notion of a homeomorphism.