For simplicity, let us consider $\mathbb{R}^2$ and $f:\mathbb{R}^2\rightarrow \mathbb{R}^2$. Then we see that $\phi:\mathbb{R}^2\rightarrow \mathbb{R}^2$ given by
\begin{align}
\phi(x_1, x_2) =&\
\begin{pmatrix}
\phi_1(x_1, x_2)\\
\phi_2(x_1, x_2)
\end{pmatrix}\\
=&\
\begin{pmatrix}
x_1\\
x_2
\end{pmatrix}
+
\begin{pmatrix}
f_{1, x_1} (a_1, a_2) & f_{1, x_2}(a_1, a_2)\\
f_{2, x_1} (a_1, a_2) & f_{2, x_2}(a_1, a_2)
\end{pmatrix}^{-1}
\left(
\begin{pmatrix}
y_1\\
y_2
\end{pmatrix}
+
\begin{pmatrix}
f_1(x_1, x_2)\\
f_2(x_1, x_2)
\end{pmatrix}
\right)\\
=&\ \begin{pmatrix}
x_1\\
x_2
\end{pmatrix}+ \frac{1}{\det Df(a)}\begin{pmatrix}
f_{2, x_2} (a_1, a_2)f_1(x_1, x_2) -f_{1, x_2}(a_1, a_2)f_2(x_1, x_2)\\
-f_{2, x_1} (a_1, a_2)f_1(x_1, x_2)+ f_{1, x_1}(a_1, a_2)f_2(x_1, x_2)
\end{pmatrix}
+\text{ const vector}
\end{align}
Then we see that
\begin{align}
\nabla\phi(x_1, x_2) =&\
\begin{pmatrix}
\phi_{1, x_1} & \phi_{1, x_2}\\
\phi_{2, x_1} & \phi_{2, x_2}
\end{pmatrix}\\
=&\
\begin{pmatrix}
1 & 0\\
0 & 1
\end{pmatrix}
+\frac{1}{\det Df(a)}
\begin{pmatrix}
f_{2, x_2} (a_1, a_2)f_{1, x_1}(x_1, x_2) -f_{1, x_2}(a_1, a_2)f_{2, x_1}(x_1, x_2) & f_{2, x_2} (a_1, a_2)f_{1, x_2}(x_1, x_2) -f_{1, x_2}(a_1, a_2)f_{2, x_2}(x_1, x_2) \\
-f_{2, x_1} (a_1, a_2)f_{1, x_1}(x_1, x_2)+ f_{1, x_1}(a_1, a_2)f_{2, x_1}(x_1, x_2) & -f_{2, x_1} (a_1, a_2)f_{1, x_2}(x_1, x_2)+ f_{1, x_1}(a_1, a_2)f_{2, x_2}(x_1, x_2)
\end{pmatrix}\\
=&\
\begin{pmatrix}
1 & 0\\
0 & 1
\end{pmatrix}
+\frac{1}{\det Df(a)}\begin{pmatrix}
f_{2, x_2} (a_1, a_2)& -f_{1, x_2}(a_1, a_2)\\
-f_{2, x_1} (a_1, a_2)& f_{1, x_1}(a_1, a_2)
\end{pmatrix}
\begin{pmatrix}
f_{1, x_1} (x_1, x_2)& f_{1, x_2}(x_1, x_2)\\
f_{2, x_1} (x_1, x_2)& f_{2, x_2}(x_1, x_2)
\end{pmatrix}\\
=&\ I+Df(a_1, a_2)^{-1} Df(x_1, x_2).
\end{align}
Higher Dimension:
However, when $n$ is large, the above way of expanding everything out then taking partial derivatives is messy. Hence we need to compute the derivative in a more elegant manner.
In general, we see that
\begin{align}
D_x \phi(x) =&\ D_x x+ D_x [A^{-1}(y-f(x))]\\
=&\ I+ D_x[A^{-1}]\circ D_x[y-f(x)]\\
=&\ I+ A^{-1}\circ(-Df(x)) = I-Df(a)^{-1} Df(x)
\end{align}
Best Answer
$\phi$ is defined everywhere and $\phi(x) = x$ iff $f(x) = y$.
Rudin states that there is 'at most one fixed point in $U$'.
If there were two distinct fixed points $u_1,u_2 \in U$ then $\phi(u_k) = u_k$ and the contraction shows that $u_1=u_2$, a contradiction. Hence there is at most one fixed point in $U$ and hence there is at most one $x \in U$ such that $f(x) = y$.