[Math] Inverse Function Theorem, Spivak’s Proof

multivariable-calculus

I'm having a lot of trouble following the proof of the following theorem. This is from Spivak's Calculus on Manifolds.

2-11 Theorem (Inverse Function Theorem). Suppose that $f: \mathbb{R}^n \to \mathbb{R}^n$ is continuously differentiable in an open set containing $a$, and let $\det f'(a) \not= 0$. Then there is an open set $V$ containing $a$ and an open set $W$ containing $f(a)$ such that $f: V\to W$ has a continuous inverse $f^{-1}:W \to V$ which is differentiable and for all $y\in W$ satisfies
$$ (f^{-1})'(y) = [f'(f^{-1}(y))]^{-1}.$$

The proof starts off with the following.

Proof. Let $\lambda$ be the linear transformation $Df(a)$. Then $\lambda$ is non-singular, since $\det f'(a) \not= 0$. Now $D(\lambda^{-1}\circ f)(a) = D(\lambda^{-1})(f(a))\circ Df(a) = \lambda^{-1}\circ Df(a)$ is the identity linear transformation. If the theorem is true for $\lambda^{-1}\circ f$, it is clearly true for $f$. Therefore we may assume at the outset that $\lambda$ is the identity.

The problem I'm having is that I don't quite see where he's trying to go with this argument. I saw another post (Spivak's proof of Inverse Function Theorem) that explained the statement

If the theorem is true for $\lambda^{-1}\circ f$, it is clearly true for $f$.

but exactly why is it necessary for this proof? Also, when it says

Therefore we may assume at the outset that $\lambda$ is the identity.

why are we assuming that $\lambda$ is the identity?

Best Answer

It's a strategy in devising and presenting proofs to prove a general statement by 1) proving that, if the general statement holds in each case under a certain assumption, then the general statement holds in all cases, then 2) making that assumption, and proving the theorem. Often the phrase "Assume without loss of generality that..." crops up, meaning that we make an assumption, but proving the proposition true under this assumption also proves the general case (i.e. we don't lose any generality from the proof).

Why do we do it? We do it often to reduce the complexity of an argument, either to reduce the number of symbols in a calculation, or to reduce the number of cases. Often these little extraneous cases/symbols will obscure what should be a relatively simple, intuitive argument, so by making these assumptions, it makes the proof more easily digestible, which is always a plus!

I'm guessing Spivak is making the assumption that $\lambda = I$ because it simplifies the notation greatly. You could try recreating the proof without this assumption, with $\lambda$ cropping up everywhere, and you should be able to see why Spivak has done this.

I was hoping to provide another, simple example of an assumption made without loss of generality in a proof, but my mind has gone blank. Maybe a commenter could provide one?