This is a typical top-down vs. bottom-up construction of a substructure. See the general discussion here.
Let $S\subseteq G$. We let
$$K = \bigcap_{S\subseteq M\leq G}M$$
and
$$H = \Bigl\{s_1^{\epsilon_1}\cdots s_m^{\epsilon_m}\mid m\geq 0,\ s_i\in S,\ \epsilon_i\in\{1,-1\}\Bigr\}.$$
We want to show that $K=H$.
Note that if $M$ is a subgroup of $G$, and $S\subseteq M$, then every element of $H$ must be in $M$, since $M$ is closed under products and inverses and contains every $s_i\in S$. Thus, $H\subseteq K$.
Conversely, to prove $K\subseteq H$, it suffices to show that $H$ is a subgroup of $G$ that contains $S$. To see that $S\subseteq H$, let $s\in S$. Letting $m=1$, $\epsilon_1=1$, and $s_1=s$ we have $s\in H$, so $S\subseteq H$.
To see that $H$ is a subgroup of $G$, note that $H$ is nonempty: selecting $m=0$ we obtain the empty product, which by definition is the identity of $G$. So $1\in H$.
Let $s_1^{\epsilon_1}\cdots s_m^{\epsilon_m}$ and $t_1^{\eta_1}\cdots t_n^{\eta_n}$, with $m,n\geq 0$, $\epsilon_i,\eta_j\in\{0,1\}$, and $s_i,t_j\in S$ be elements of $S$. Then
$$\Bigl( s_1^{\epsilon_1}\cdots s_m^{\epsilon_m}\Bigr)\Bigl(t_1^{\eta_1}\cdots t_n^{\eta_n}\Bigr)^{-1} = r_1^{\chi_1}\cdots r_{n+m}^{\chi_{n+m}}$$
where
$$\begin{align*}
r_i &= \left\{\begin{array}{ll}
s_i &\text{if }1\leq i\leq m\\
t_{n+m-i+1} & \text{if }m\lt i\leq n+m
\end{array}\right.\\
\chi_i &= \left\{\begin{array}{ll}
\epsilon_i &\text{if }1\leq i\leq m\\
-\eta_{n+m-i+1} & \text{if }m\lt i\leq n+m
\end{array}\right.
\end{align*}$$
Note that $r_i\in S$ for each $i$, and $\chi_i\in\{1,-1\}$ for each $i$, so $r_1^{\chi_1}\cdots r_{n+m}^{\chi_{n+m}}$ is an element of $H$. Thus, $H$ is a subgroup of $G$ that contains $S$, and so is one of the subgroups being intersected in the definition of $K$. Hence, $K\subseteq H$.
Since we already had $H\subseteq K$, it follows that $H=K$, as desired.
Yes you have $d_1(x,y) \geq \ d_2(x,y)$ in the above set up but in general there is not a ton to say. As mentioned in the comments this is this idea of distortion which measures certain aspects of how the subgroup $H$ fits inside the group $G$. If you would like you can look at this blog post which discusses some of the ideas and defines this distortion functions, which intuitively compares the intrinsic geometry of the subgroup(its own word metric) and how that subgroup fits inside the full group.
A simple example comes from a Baumslag-Solitar group $G=BS(1,2)= \langle a,t \mid tat^{-1} = a^2 \rangle$(discussed in the above blog post, which you should look at -- it has pictures). Consider $H=\langle a \rangle$ in $G$. Well
$$t^nat^{-n}=t^{n-1} t a t^{-1}t^{-n+1}=t^{n-1} a^2 t^{-n+1}= (t^{n-1} a t^{-n+1}) (t^{n-1}a t^{-n+1})= \dots =a^{2^n} $$
which gives that $d_1(1,a^{2^n})=2^n \geq 2n+1 \geq d_2(1,a^{2^n})$ which is a pretty big difference in the geometry.
Now sometimes you can say more, although normally "coarse-ify" things up to some sort of equivalence so that the choice of generating sets does not change the answer. For example in hyperbolic groups and CAT(0) groups it is known that abelian subgroups are undistorted/quasi-isometrically embedded.
Best Answer
Here are two reasons the group generated by $U$ is defined as the set of finite products, which I will denote by $\Pi(U)$:
1.) The group generated by $U$ is usually taken to be the smallest group containing $U$; it is evident the set $\Pi(U)$ satisfies this criterion, since it is clearly closed under the group operation (finite products of finite products of elements of $U$ are, after all, themselves finite products of elements of $U$) and the taking of inverses, and contains the identity element $e$ since
$e = xx^{-1}, \; x \in U; \tag 1$
thus $\Pi(U)$ is a group; and any group containing $U$ must contain $\Pi(U)$ if it is to be closed under the group operation and inversation. Indeed, $\Pi(U)$ is often though of as he intersection of all groups containing $U$; in this sense it is the smallest group containing $U$.
2.) We really can't define infinite products of elements of $U$ anyway, in a purely algebraic sense; to do so generally requires some notion of $convergence$ of a sequence of products such as
$x_1x_2, x_1x_2x_3, x_1x_2x_3x_4, \ldots; \tag 2$
but convergence lies in the realm of topology, so we would have to adopt some appropriate topological structure to give meaning to such infinite products.
Well, there are two of my main reasons for accepting the definition of the group generated by $U$ as $\Pi(U)$. The comment stream attached to the question itself contains more useful insights, cf. the remarks of ThorWitch and Captain Lama.