A rigorous proof that the infinite product of countably many 2-point spaces $A_n = \{0,1\}$ does not exist: (Really, just an elaboration of Martin's comment.)
Assume that $P$ is such a product, then $P$ must be $\prod_n A_n$ as a set. Let $e_n$ be the distance (in $P$) between the $0$-function and the function $x_n$ that takes the value $1$ only in the $n$-th place. The sequence $(e_n)$ must converge to $0$, because the $x_n$ will converge to the $0$ function. (They must converge by compactness, and the continuity of the projections means that all entries of the limit must be $0$.)
[EDIT: As Martin Brandenburg pointed out, compactness is not enough to show convergence of the whole sequence, but only of some subsequence $(e_{n_i})$. For notational simplicity, assome $n_i=i$ for all $i$; otherwise restrict attention from $(x_n)$, $(e_n)$, $(d_n)$ to the respective subsequences $(x_{n_i}$, $(e_{n_i})$, $(d_{n_i})$ from now on.]
Let $(d_n)$ be a sequence of real numbers converging to $0$, but more slowly than $(e_n)$, i.e., $e_n = o(d_n)$. (E.g., $e_n=\sqrt{d_n}$.) Let $Q = \prod A_n$, but with the following metric: $d(x,y) = e_n$, if $x$ and $y$ agree on the first $n$ values, but not on the next one. Project $Q$ to each $A_n$ $-$ these maps are Lipschitz-continuous. So they factor through a map from $Q$ to $P$. Now this map must be the identity (apply the forgetful functor); but points with distance $e_n$ are then mapped to points with distance $d_n$, which is not possible for a Lipschitz function.
As k.stm says in the comments, usually a more general thing is true: these categories $C$ are equipped with forgetful / underlying set functors $U : C \to \text{Set}$ which tend to have a left adjoint, the "free" functor $F : \text{Set} \to C$. Whenever this is true, it follows that $U$ preserves all limits, not just products.
Sometimes, but more rarely, $U$ will also have a right adjoint, which will give a "cofree" functor. Whenever this is true, it follows that $U$ preserves all colimits, not just coproducts. This happens, for example, when $C = \text{Top}$: here the left adjoint equips a set with the discrete topology and the right adjoint equips a set with the indiscrete topology. But it doesn't happen for, say, groups or rings.
So one way to reformulate your question is:
Why do the forgetful functors $U : C \to \text{Set}$ we write down tend to have left adjoints, but not right adjoints? Equivalently, why are there usually free structures, but not usually cofree structures?
A rough answer is that we can expect "free" structures whenever the structure is described by operations satisfying equational axioms, since then we can build free objects by applying all possible operations modulo all axioms. On the other hand, operations also make it difficult for the forgetful functor to preserve coproducts, since in a coproduct of two structures you can apply operations to elements of both structures, so you'll usually get something bigger than the disjoint union.
(Dually, you should expect "cofree" structures whenever the structure is described by "co-operations," and this does in fact happen: for example, the forgetful functor from coalgebras to vector spaces has a right adjoint but not a left adjoint, called the cofree coalgebra.)
A more precise answer would invoke, say, the machinery of Lawvere theories, which among other things has the benefit of also telling you exactly what colimits you can expect these forgetful functors $U$ to preserve. This is a long story so I don't want to get into it unless you feel like it really answers your question, but the gist is that Lawvere theories present familiar structures like groups, rings, and modules in a way that fundamentally uses finite products, but nothing else. You can deduce from this that the forgetful functor $U$ preserves (and in fact creates) any limits or colimits that commute with finite products in $\text{Set}$. Every limit commutes with finite products, and the colimits that commute with finite products in $\text{Set}$ are precisely the sifted colimits. These include, for example, increasing unions, which is an abstract way to see why the set-theoretic increasing union of a sequence of groups is still a group, and the same with groups replaced by rings, modules, etc.
Best Answer
Here's an argument for the first question. Take $X=Y=S=\mathbb R$ and take $x\mapsto x^2$ for the maps $X,Y\to S$. As you note, the underlying set of $X\times_S Y$ is what you'd expect, $\{(x,y)\mid x=\pm y\}$ (at least up to bijection, and so without loss of generality).
Lemma: Let $Z$ be a subset of $\{(x,y)\mid x=\pm y\}$ such that $Z$ defines a smooth manifold $M$ in the usual topology coming from $X\times Y$. Then the subspace topology induced on $Z$ from $X\times_S Y$ is the usual topology.
Proof: By the universal property of $X\times_S Y$ (and using morphisms from the point), the inclusion $M\to X\times Y$ factorises in $\mathsf{Man}$ into two inclusions: $$M\to X\times_S Y\to X\times Y$$ Restricting to $Z$ we get continuous inclusions which we can denote $M\to M'\to M$. But this implies $M=M'$. $\square$
In particular, applying the lemma three times, with $Z$ equal to $\{(x,y)\mid x=\pm y\}\setminus\{(0,0)\}$ and $\{(x,x)\mid x\in\mathbb R\}$ and $\{(x,-x)\mid x\in\mathbb R\}$, we find that $X\times_S Y$ induces the usual topology on these sets. So $X\times_S Y$ is connected but $X\times_S Y\setminus\{(0,0)\}$ has four connected components. This cannot happen with smooth manifolds: removing a point from a manifold increases the number of connected components by at most $1$.