$\lambda\colon \mathbb{R}^n\to\mathbb{R}^n$ is a bijection and $\lambda$ and $\lambda^{-1}$ are both continuously differentiable.
Note that $\lambda'(z) = \lambda$ for all $z \in \mathbb{R}$.
Let $g = \lambda^{-1}\circ f$.
Suppose the theorem is true for $g$.
Then there is an open set $V'$ containing $a$ and an open set $W'$ containing $g(a)$ such that $g:V'\to W'$ has a continuous inverse $g^{-1}:W'\to V'$ which is differentiable and for all $y\in W'$ satisfies $$(g^{-1})'(y) = [g'(g^{-1}(y))]^{-1}$$
Then $\lambda(W')$ is open and $f = \lambda\circ g:V'\to \lambda(W')$ has a continuous inverse $g^{-1}\circ \lambda^{-1}:\lambda(W')\to V'$.
By the chain rule, for all $z \in \lambda(W')$, $(f^{-1})'(z) = (g^{-1}\circ\lambda^{-1})'(z) = (g^{-1})'(\lambda^{-1}(z))\circ \lambda^{-1} = [g'(g^{-1}(\lambda^{-1}(z)))]^{-1}\circ \lambda^{-1} = [g'(f^{-1}(z))]^{-1}\circ \lambda^{-1} = [\lambda\circ g'(f^{-1}(z)]^{-1} = [f'(f^{-1}(z))]^{-1}$
It's a strategy in devising and presenting proofs to prove a general statement by
1) proving that, if the general statement holds in each case under a certain assumption, then the general statement holds in all cases, then
2) making that assumption, and proving the theorem.
Often the phrase "Assume without loss of generality that..." crops up, meaning that we make an assumption, but proving the proposition true under this assumption also proves the general case (i.e. we don't lose any generality from the proof).
Why do we do it? We do it often to reduce the complexity of an argument, either to reduce the number of symbols in a calculation, or to reduce the number of cases. Often these little extraneous cases/symbols will obscure what should be a relatively simple, intuitive argument, so by making these assumptions, it makes the proof more easily digestible, which is always a plus!
I'm guessing Spivak is making the assumption that $\lambda = I$ because it simplifies the notation greatly. You could try recreating the proof without this assumption, with $\lambda$ cropping up everywhere, and you should be able to see why Spivak has done this.
I was hoping to provide another, simple example of an assumption made without loss of generality in a proof, but my mind has gone blank. Maybe a commenter could provide one?
Best Answer
If $\mathcal J Df(a)\neq 0$ then the linear transformation $Df(a):=\lambda:\mathbb R^n\to \mathbb R^n$ is invertible in some neighborhood $U\ni a$. Note that $D\lambda(x)=\lambda$ since $\lambda$ is a linear transformation. The same is true of course, for $\lambda^{-1}.$
Now consider $g:=\lambda^{-1}\circ f.$ We have then by the chain rule,
$Dg(a)=D\lambda^{-1}(f(a))\circ Df(a)=\lambda^{-1}\circ Df(a)=I.$
If the theorem is true for $g$ then $g$ is invertible (in some neighborhood of $a$) and so $f$ is also invertible. Indeed, $g^{-1}=f^{-1}\circ\lambda\Rightarrow f^{-1}=g^{-1}\circ\lambda^{-1}.$
So we may as well assume that $Df(a)=I$ in the first place.