Understanding a step in Rudin’s proof of the Inverse Function Theorem

analysischain rulederivativesmultivariable-calculus

I am reading Rudin's "Principles of Mathematical Analysis", and in a step for the proof on the inverse function theorem he says that the derivative of:

$\phi(x)=x+A^{-1}(y-f(x))$

is

$\phi'(x)=I-A^{-1}f'(x)$

Where $A=f'(a)$ and $a$ is a vector in $\mathbb R^n$ (as are $x$ and $y$).

I read on some online sources that this derivative uses the chain rule, although I am not sure how. Could somebody walk me through the steps on how this derivative was computed?

Thanks!

Best Answer

For simplicity, let us consider $\mathbb{R}^2$ and $f:\mathbb{R}^2\rightarrow \mathbb{R}^2$. Then we see that $\phi:\mathbb{R}^2\rightarrow \mathbb{R}^2$ given by \begin{align} \phi(x_1, x_2) =&\ \begin{pmatrix} \phi_1(x_1, x_2)\\ \phi_2(x_1, x_2) \end{pmatrix}\\ =&\ \begin{pmatrix} x_1\\ x_2 \end{pmatrix} + \begin{pmatrix} f_{1, x_1} (a_1, a_2) & f_{1, x_2}(a_1, a_2)\\ f_{2, x_1} (a_1, a_2) & f_{2, x_2}(a_1, a_2) \end{pmatrix}^{-1} \left( \begin{pmatrix} y_1\\ y_2 \end{pmatrix} + \begin{pmatrix} f_1(x_1, x_2)\\ f_2(x_1, x_2) \end{pmatrix} \right)\\ =&\ \begin{pmatrix} x_1\\ x_2 \end{pmatrix}+ \frac{1}{\det Df(a)}\begin{pmatrix} f_{2, x_2} (a_1, a_2)f_1(x_1, x_2) -f_{1, x_2}(a_1, a_2)f_2(x_1, x_2)\\ -f_{2, x_1} (a_1, a_2)f_1(x_1, x_2)+ f_{1, x_1}(a_1, a_2)f_2(x_1, x_2) \end{pmatrix} +\text{ const vector} \end{align}

Then we see that \begin{align} \nabla\phi(x_1, x_2) =&\ \begin{pmatrix} \phi_{1, x_1} & \phi_{1, x_2}\\ \phi_{2, x_1} & \phi_{2, x_2} \end{pmatrix}\\ =&\ \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix} +\frac{1}{\det Df(a)} \begin{pmatrix} f_{2, x_2} (a_1, a_2)f_{1, x_1}(x_1, x_2) -f_{1, x_2}(a_1, a_2)f_{2, x_1}(x_1, x_2) & f_{2, x_2} (a_1, a_2)f_{1, x_2}(x_1, x_2) -f_{1, x_2}(a_1, a_2)f_{2, x_2}(x_1, x_2) \\ -f_{2, x_1} (a_1, a_2)f_{1, x_1}(x_1, x_2)+ f_{1, x_1}(a_1, a_2)f_{2, x_1}(x_1, x_2) & -f_{2, x_1} (a_1, a_2)f_{1, x_2}(x_1, x_2)+ f_{1, x_1}(a_1, a_2)f_{2, x_2}(x_1, x_2) \end{pmatrix}\\ =&\ \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix} +\frac{1}{\det Df(a)}\begin{pmatrix} f_{2, x_2} (a_1, a_2)& -f_{1, x_2}(a_1, a_2)\\ -f_{2, x_1} (a_1, a_2)& f_{1, x_1}(a_1, a_2) \end{pmatrix} \begin{pmatrix} f_{1, x_1} (x_1, x_2)& f_{1, x_2}(x_1, x_2)\\ f_{2, x_1} (x_1, x_2)& f_{2, x_2}(x_1, x_2) \end{pmatrix}\\ =&\ I+Df(a_1, a_2)^{-1} Df(x_1, x_2). \end{align}

Higher Dimension:

However, when $n$ is large, the above way of expanding everything out then taking partial derivatives is messy. Hence we need to compute the derivative in a more elegant manner.

In general, we see that \begin{align} D_x \phi(x) =&\ D_x x+ D_x [A^{-1}(y-f(x))]\\ =&\ I+ D_x[A^{-1}]\circ D_x[y-f(x)]\\ =&\ I+ A^{-1}\circ(-Df(x)) = I-Df(a)^{-1} Df(x) \end{align}