These days I've been looking for a rigurous proof of the multivariable chain rule and I've finally found one that I think is very easy to understand. I will leave it here (if nobody minds) for anybody searching for this that is not familiar with little-o notation, Jacobians and stuff like this. To understand this proof, all you need to know is the mean value theorem.
Let's say we have a function $f(x,y)$ and $x = x(t), y = y(t)$. Let's also take $z(t) = f(x(t), y(t))$ By definition, the derivative of z $z'(t)$ is
$$ z'(t) = \lim_{\Delta t \to 0}{\frac {f(x(t+\Delta t),y(t+\Delta t)) - f(x,y)}{\Delta t}}$$.
$$ Let \ \Delta x = x(t+\Delta t)-x(t),$$ $$\Delta y = y(t+\Delta t)-y(t)$$
Now I'll take the numerator of the fraction in the limit, and make a small change.
$$ f(x(t+\Delta t), y(t+\Delta t)) - f(x,y) = f(x+\Delta x, y+\Delta y) - f(x,y)$$
$$ = \left[f(x+\Delta x, y+\Delta y) - f(x+\Delta x, y)\right] + \left[f(x+\Delta x, y) - f(x, y)\right]$$
I have just added and substracted $f(x+\Delta x, y)$. For some reason, I will invert the terms.
$$ = \left[f(x+\Delta x, y) - f(x, y)\right] + \left[f(x+\Delta x, y+\Delta y) - f(x+\Delta x, y)\right]$$.
Now, let's define 2 functions and I will name them g and h. First,
$$ Let \ g(x) = f(x, y) \implies g'(x) = \frac {\partial f} {\partial x} $$.
Please note that y is constant here since g is a function of a single variable. Now, by the mean value theorem we have
$$ \exists c_1 \in (x, x+\Delta x) \ so \ that$$
$$\frac {g(x+\Delta x) - g(x)} {\Delta x} = g'(c_1) $$
$$ \Longleftrightarrow $$
$$ f(x+\Delta x, y) - f(x, y) = f_x(c_1, y)\Delta x$$
Similarly, using the function
$$ h(y) = f(x + \Delta x, y) \implies h'(y) = \frac {\partial} {\partial y}f(x+\Delta x, y)$$
We will have by the same logic that
$$ f(x+\Delta x, y + \Delta y) - f(x+\Delta x, y) =
f_y(x + \Delta x, c_2)\Delta y, c_2 \in (y, y+\Delta y) $$
Notice that $c_1$ and $c_2$ are bounded with respect to $\Delta x$ and $\Delta y$
So as $\Delta x \to 0, c_1 \to x$ and as $\Delta y \to 0, c_2 \to y$. By our definition of $\Delta x$ and $\Delta y$, as $\Delta t \to 0$, both $\Delta x$ and $\Delta y$ $\to 0$. So, as $\Delta t \to 0$, $c_1 \to x$ and $c_2 \to y$.
The last step of the proof is to sum this all up, divide by $\Delta t$ and take the limit as $\Delta t \to 0$
$$ f(x(t+\Delta t), y(t+\Delta t)) - f(x, y) = f_x(c_1, y)\Delta x + f_y(x+\Delta x, c_2)\Delta y $$
$$ \lim_{\Delta t \to 0} \frac {f(x(t+\Delta t), y(t+\Delta t))}{\Delta t} = \lim_{\Delta t \to 0} f_x(c_1, y)\frac {\Delta x}{\Delta t} + f_y(x+\Delta x, c_2)\frac {\Delta y}{\Delta t} = f_x(x, y)x'(t) + f_y(x, y)y'(t) \ QED $$
Edit: After a long time I've realised that this proof assumes that $f$ has partial derivatives defined on intervals around the point $(x, y)$ and they are continuous at the point. This is a sufficient condition for the function to be ($\mathbb{R}^2$-)differentiable at $(x, y)$, but it's not equivalent. Yet, the multivariable chain rule works for the function being just differentiable at that point. So for a general proof, one should first understand little-o notation as in the other answers.
The formulas you have stated above are not the chain rule in the original sense.
The chain rule simply says that if you compose two differentiable functions, say $f$ with $f:M\to N$ and $g$ with $g:V\to W$ where $g(V)\subseteq M$ then $(f\circ g)$ is again a differentiable function with $(f\circ g):V\to N$ and $D(f\circ g) = Df(g)Dg$. Here $Df$ denotes the (total) derivative of $f$.
In order to transfer this idea to your problem one needs to make further assumptions on the given functions $f$, $x$ and $y$.
I will assume that $f:\mathbb{R}^2\to \mathbb{R}$, with ${x\choose y}\mapsto f {x\choose y}$, $x:\mathbb{R}^2\to \mathbb{R}$ with ${u\choose v}\mapsto x{u\choose v}$ and $y:\mathbb{R}^2\to \mathbb{R}$ with ${u\choose v}\mapsto y{u\choose v}$. Note that you can't compound $f$ solely with $x$ or solely with $y$ because the images of $x$ and $y$ are not a subset of $\mathbb{R}^2$. However, you can define another function $g:\mathbb{R}^2\to \mathbb{R}^2$ with $g{u \choose v}={x(u,v) \choose y(u,v)}$. Now, under the assumption that $g$ is differentiable you can apply the chain rule as follows (where $D_x$ and $D_y$ denote the corresponding partial derivatives):
$$D(f\circ g){u \choose v} = Df\left(g{u \choose v}\right)Dg{u \choose v}=\left(\begin{array}{rr}
D_xf(g{u \choose v}), & D_yf(g{u \choose v}) \\\end{array}\right)
\left(\begin{array}{rr}
D_ux{u \choose v} & D_vx{u \choose v} \\
D_uy{u \choose v} & D_vy{u \choose v} \\
\end{array}\right)$$
If you multiply both matrices you get:
$$D(f\circ g){u \choose v}= \left(\begin{array}{rr}
D_xf(g{u \choose v})D_ux{u \choose v}+ D_yf(g{u \choose v})D_vx{u \choose v} ,& D_xf(g{u \choose v})D_uy{u \choose v}+ D_yf(g{u \choose v})D_vy{u \choose v}\\\end{array}\right).$$ The first column of the matrix is your first formula and the second column corresponds to you second formula. So this is basically what you were trying to say when you refer to applying the chain rule but in a rigorous way.
Now you want to extend the example to the case where $y$ is a function of $a,b$. This means that $y$ is defined on a different domain than $x$. As I mentioned earlier you can't simply compose $f$ with $x$ and $y$ but must define another function $g$ whose image matches the domain of $f$. The question now would be how to define this function $g$? It must be something like:
$$g:\mathbb{R}^4\to\mathbb{R}^2,~g\left(\begin{array}{rrrr}
u\\\ v\\\ a\\\ b\\\end{array}\right)={x(u,v,a,b) \choose y(u,v,a,b)}={x(u,v) \choose y(u,v)}.$$
Note that $x$ and $y$ are technically functions of $(u,v,a,b)$. However, the variables $a,b$ don't appear in the function $x$ and $u,v$ don't appear in $y$. You can now apply the chain rule as well to $(f\circ g):\mathbb{R}^4\to\mathbb{R}$. The difference to the case where $y$ is also a function of $u,v$ is that:
$$D(f\circ g){u \choose v} =\left(\begin{array}{rr}
D_xf(g{u \choose v}), & D_yf(g{u \choose v}) \\\end{array}\right)\left(\begin{array}{rr}
D_ux{u \choose v} & D_vx{u \choose v} \\
D_uy{u \choose v} & D_vy{u \choose v} \\
\end{array}\right)$$
now becomes
$$D(f\circ g){u \choose v} = \left(\begin{array}{rr}
D_xf(g{u \choose v}), & D_yf(g{u \choose v}) \\\end{array}\right) \left(\begin{array}{rr}
D_ux{u \choose v}, & D_vx{u \choose v}, & D_ax{u \choose v} & D_bx{u \choose v}\\
D_uy{a \choose b}, & D_vy{a \choose b}, & D_ay{a \choose b}, & D_by{a \choose b}, \\
\end{array}\right)=\left(\begin{array}{rr}
D_xf(g{u \choose v}), & D_yf(g{u \choose v}) \\\end{array}\right)\left(\begin{array}{rr}
D_ux{u \choose v}, & D_vx{u \choose v}, &0 & 0\\
0, & 0, & D_ay{a \choose b}, & D_by{a \choose b}, \\
\end{array}\right).$$
So multiplying both matrices will now yield four formulas.
Best Answer
We can write \begin{align*} S:\frac{1}{x}+\arctan(y+2z)=1\tag{1} \end{align*} as function in $x=x(y,z)$.
Comment:
In (2) we use $\left(\frac{1}{g(z)}\right)^{\prime}=-\frac{\left(g(z)\right)^{\prime}}{(g(z))^2}$.
In (3) we use $x=\frac{1}{1-\arctan(y+2z)}$.