First of all, if $x:U\subset \mathbb R^2\rightarrow S$ is a parametrization, then $x^{-1}: x(U) \rightarrow \mathbb R^2$ is differentiable: indeed, following the very definition of a differentiable map from a surface, $x$ is a parametrization of the open set $x(U)$ and since $x^{-1}\circ x$ is the identity map, it is differentiable.
Now, let $p$ be a point on the surface $S$, $x:U\subset \mathbb R^2\rightarrow S$ be a parametrization s.t. $x(0)=p$ and $y:V\subset \mathbb R^2\rightarrow S$ be another parametrization s.t. $L(p)=y(0)$.
To make it clear, let's say that $x(u,v)=(x_1(u,v),x_2(u,v),x_3(u,v))$ and $y^{-1}(x,y,z)=(\varphi_1(x,y,z),\varphi_2(x,y,z))$ then the map $L\circ x:U\rightarrow S$ is given by : $$L\circ x (u,v)=\begin{pmatrix} a&b&c\\d&e&f \\g&h&i\end{pmatrix}\begin{pmatrix} x_1(u,v) \\ x_2(u,v) \\ x_3(u,v) \end{pmatrix}$$
So $f(u,v)=y^{-1}\circ L \circ x(u,v)$ looks like $$f(u,v)=y^{-1}\circ L \circ x(u,v)=\\\ \begin{pmatrix}\varphi_1(ax_1(u,v)+bx_2(u,v)+cx_3(u,v),\cdots,gx_1(u,v)+hx_2(u,v)+ix_3(u,v)) \\ \varphi_2(gx_1(u,v)+hx_2(u,v)+ix_3(u,v),\cdots,gx_1(u,v)+hx_2(u,v)+ix_3(u,v))\end{pmatrix}$$
which is clearly differentiable.
Moreover, you can easily check using the chain rule that $$df_0=d(y^{-1})_{L(p)}\circ L \circ dx_0.$$
Roughly speaking, this map does : $$\mathbb R^2 \underset{dx}{\longrightarrow} T_pS \underset{L}{\longrightarrow} T_{L(p)}S\underset{dy^{-1}}{\longrightarrow} \mathbb R^2$$
which means that you send a vector of $\mathbb R^2$ onto $T_pS$ using the parametrization $x$ (it always gives you a good basis of the tangent space), then L acts and you read the information again using the second parametrization $y$ that takes the new vector onto $\mathbb R^2$.
So $L$ is nothing else but the derivative of $L:S\rightarrow S$ as a map between two surfaces.
In fact, this has to be expected because you might know that the derivative of a linear map between two vector spaces does not depend on the point and is equal to itself, so it has to be the same for surface or submanifold in general.
First part of your question
Basically it's all to do with the inverse function theorem. But more explicitly:
Let $\mathbf{x}:U \subset \mathbb{R}^2 \rightarrow S$ be a parameterization and $q$ be a point in $U$.
Define $\mathbf{F}:U \times I \rightarrow \mathbb{R}^3$ by
$$
\mathbf{F}(u,v,t):= \mathbf{x}(u,v) + t\cdot \hat{e}_3, \hspace{1in} t \in I.
$$
Another way to think about $\mathbf{F}$ is as follows:
If we know that $\mathbf{x}(u,v) = \langle x(u,v), y(u,v), z(u,v) \rangle$, then
$$
\mathbf{F}(u,v,t) = \langle x(u,v), y(u,v), z(u,v) +t \rangle.
$$
So, given that $\mathbf{x}(u,v)$ is a parameterization, we know that it must be differentiable, thus $\mathbf{F}(u,v,t)$ is also differentiable.
Furthermore, since $\mathbf{x}(u,v)$ is a parameterization, we can assume without loss of generality, that
$$
\dfrac{\partial(x,y)}{\partial(u,v)}(q) \not= 0.
$$
Now take the differential of $\mathbf{F}(u,v,t)$ at $q$:
$$
d\mathbf{F}_q = \left| \begin{array}{ccc}
x_u & x_v & 0 \\
y_u & y_v & 0 \\
z_u & z_v & 1 \end{array} \right|_{q} = \dfrac{\partial(x,y)}{\partial(u,v)}(q).
$$
Thus $d\mathbf{F}_q \not=0$. This satisfies the condition for the inverse function theorem. Hence, there exists an open neighborhood, say $M$, around $\mathbf{F}(q) \in \mathbb{R}^3$ such that $\mathbf{F}^{-1}$ exists and is differentiable.
But if we set $t=0$ in the expression for $\mathbf{F}(u,v,t)$, then $\mathbf{x}(u,v) = \mathbf{F}(u,v,0)$. Hence $\mathbf{x}^{-1}$ is differentiable around $q$. Since $q$ was arbitrarily chosen, we know that this holds for all points in $U$. Hence $\mathbf{x}^{-1}:\mathbf{x}(U) \rightarrow R^2$ is differentiable.
Second part of your question
If there exists a diffeomorphism between $U$ and $\mathbf{x}(U)$, then $U$ and $\mathbf{x}(U)$ are diffeomorphic. But we already know that $\mathbf{x}(U)$ is differentiable and invertible by hypothesis; and we just showed that $\mathbf{x}^{-1}$ exists and is differentiable. Thus $\mathbf{x}(u,v)$ is a diffeomorphism, so $U$ and $\mathbf{x}(U)$ are diffeomorphic to each other.
Remarks
I like to think about $U$ as a sheet of rubbery paper, and $\mathbf{x}(U)$ as me bending the paper to look like the surface. Then $\mathbf{x}^{-1}$ is just me bending the paper back into its' original shape. Moreover, you want to make sure that your logic does not depend on this special sheet of paper; so you take another rubbery sheet of paper, $V$, and bend that into the shape of the surface as well by $\mathbf{y}(V)$. Now I like to think that if I can bend the original rubbery sheet of paper, $U$, into the other rubbery sheet of paper, $V$, then I have all my bases covered and my logic does not depend on any particular sheet. The Change of Parameters theorem basically tells us that you can bend $U$ into $V$ by $\mathbf{y}^{-1} \circ \mathbf{x}: U \rightarrow V$ or vice-versa if you want to bend $V$ into $U$ by $\mathbf{x}^{-1} \circ \mathbf{y}: V \rightarrow U$.
Best Answer
I think your questions will be answered most clearly if we try to identify the assumptions under which the definitions you wrote for differentiability make sense. The first thing to note is, like you wrote, that the first definition tells you when a map is differentiable and what is its differential while the second definition only tells you when a map is differentiable. A differentiable map from a surface also has a differential, but it is usually discussed separately from the definition of differentiability.
The first definition you have given makes sense for functions $f \colon D \rightarrow \mathbb{R}$ defined on an open subset $D \subseteq V$ of some normed vector space $(V, ||\,||)$. We write $x - a$, and we require from the map $L \colon V \rightarrow \mathbb{R}$ to be linear, so $V$ better have the structure of a vector space and we use norm to make sense of the limit and estimate the size of $x - a$. The interpretation of $L$ we have in mind is that given $v \in V$ with $||v|| = 1$, $Lv = \frac{d}{dt} f(a + tv)|_{t =0}$ is directional derivative of $f$ in the direction of $v$.
This definition makes sense even if $D$ is not open if we take the limit in $D$, treating $D$ for example as a metric space with a metric induced from the norm $||\,||$. However, this raises many problems. For example, if $D = \{ (x,y,z) \, | \, z = 0 \} \subset \mathbb{R}^3$, then the derivative $L$ won't be unique. The limit $x \to a$ is taken in $D$, so $x - a$ always lie in $D$ and thus $L(x-a)$ that appears in the limit depends only on how $L$ acts on the subspace $D$ and not on the whole $\mathbb{R}^3$. The problem is that in $D$, one can approach a point $a$ using only directions that lie in the xy plane and so it doesn't make sense to require a priori from $L$ to be defined on the whole vector space $\mathbb{R}^3$ but to be defined only on the directions that are relevant to the limit. Of course, we can take $D$ to be something like $\{ (x,|x|,0) \, | \, x \in \mathbb{R} \}$ and then there are only two directions from which we can approach $a = (0,0,0)$ inside $D$ and so it doesn't make sense to encode the directional derivative in an operator that is defined on a vector space.
That is why $D$ is taken be an open set and so each point $a \in D$ can be approached inside $D$ through all possible directions and we say that $f$ is differentiable at $x = a$ if all possible directional derivatives can be "encoded uniformly in a linear operator" $L \colon \mathbb{R}^3 \rightarrow \mathbb{R}$.
Now, if $S \subseteq \mathbb{R}^3$ is a regular surface, it is never open in $\mathbb{R}^3$ so we need a different approach. We think of a regular surface as a two-dimensional object and so to check the differentiability of $f$ at $p$, we intuitively know that we need to check it only with respect to the directions the point $p$ can be approached inside $f$. The easiest definition is then to compose $f$ with a coordinate chart around $x = p$ (which turns $f$ into a function of two variables) and then to say $f$ is differentiable if the composition is differentiable.
Connecting to what I wrote before, if you read further, you'll see that the fact that $S$ is a regular surface and not an arbitrary subset of $\mathbb{R}^3$ guarantees that at each $p \in S$ there is a two-dimensional affine vector subspace of $\mathbb{R}^3$ called that tangent plane that consists of the velocities ("directions") of all the curves that pass through $p$ and live on $S$. This tangent plane depends on the point $p \in S$ and changes when we move $p$ around. The differential of $f$ at $p \in S$ will be defined as a linear map $df_p \colon T_pS \rightarrow \mathbb{R}$ and if $f$ is the restriction of a differentiable map $\tilde{f} \colon \mathbb{R}^3 \rightarrow \mathbb{R}$, then under appropriate identifications, $df_p$ will be the restriction of $d\tilde{f}_p$ to the two dimensional subspace of "relevant" directions tangent to $S$.