Analysis – Understanding Rudin’s Rank Theorem

analysisdifferential-geometry

In Rudins's Principles of Mathematical Analysis, he states the (Constant) Rank Theorem like this:

Theorem Suppose $m,n,r$ are nonnegative integers, $m\ge r, n\ge r$, $F$ is a $C^1$ mapping of an open set $E\subset \mathbb{R}^n$ into $\mathbb{R}^m$, and $F'(x)$ has rank $r$ for every $x\in E$. Fix $a\in E$, put $A = F'(a)$, let $Y_1$ be the range of $A$, and let $P$ be a projection in $\mathbb{R}^m$ whose range is $Y_1$ and let $Y_2$ be the null space of $P$. Then there are open sets $U$ and $V$ in $\mathbb{R}^n$, with $a\in U\subset E$, and there is a 1-1 $C^1$ mapping $H$ of $V$ onto $U$ (whose inverse is also of $C^1$) such that $$F(H(x)) = Ax+\phi(Ax)\;\;\;\;(x\in V)$$where $\phi$ is a $C^1$ mapping of the open set $A(V)\subset Y_1$ into $Y_2$.

From reading about the theorem online, I've come to understand that this theorem is a generalization of the Inverse Function Theorem and that it should say something like "the function looks like its derivative in a small region".

I just don't understand what Rudin is saying here, though. For instance, this part:

Then there are open sets $U$ and $V$ in $\mathbb{R}^n$, with $a\in U\subset E$, and there is a 1-1 $C^1$ mapping $H$ of $V$ onto $U$ (whose inverse is also of $C^1$) such that $$F(H(x)) = Ax+\phi(Ax)\;\;\;\;(x\in V)$$where $\phi$ is a $C^1$ mapping of the open set $A(V)\subset Y_1$ into $Y_2$.

What is $H$ supposed to represent? And what is $V$? I understand that $U$ is essentially a neighborhood of $a$, but $V$ is a seemingly random open set somewhere in $\mathbb{R}^n$ which serves as the domain of a seemingly random bijection $H$. How is this all related to the projection $P$? Why is its null space $Y_2$ of importance here?

Furthermore, what is the formula

$F(H(x)) = Ax+\phi(Ax)$

trying to say? Perhaps something about $F$ being close to the derivative within $U$, but then again I don't understand what function $\phi$ serves here. Is it some sort of adjusting factor? I'm very confused and am essentially looking for "notes" and "comments" on the individual sentences in the statement of this theorem.

Also, this post is not a duplicate of this one. I have read the approved answer there a few times and I do understand it, but it's not what I'm looking for in the sense that it completely reformulates the theorem's statement rather than explaining what Rudin is doing with his formulation here.

Thank you.

Best Answer

Rudin's formulation of the theorem is confusing because he's introducing only the bare minimum of linear algebra in order to keep the text as concise as possible; furthermore in the proof I'd say about a quarter of it is devoted to getting the linear algebra right (the correct ranges, kernels, and direct sum decompositions even if he doesn't explicitly mention direct sums), this is confusing because he's doing too many things too quickly. The answer below is divided into two main stages, the first is linear algebraic facts (basically several equivalent ways of saying something has rank $r$), and the second is a proof of the constant rank theorem from the inverse function theorem, and hopefully in the process of the proof, you'll see glimpses of what Rudin's formulation is saying.


Stage 1. Linear Algebraic Facts

Let $V,W$ be vector spaces over a field $\Bbb{F}$ (think $\Bbb{R}$ if you wish) of dimensions $n,m$ respectively, and let $T:V\to W$ be a linear transformation of rank $r$. Our goal is to describe a 'normal form' for such maps, and there are several ways of describing this. The simplest is that we fix a matrix representation of $T$, and perform row reduction and column reduction until we get an $r\times r$ identity block in the top left \begin{align} [T]\to \begin{pmatrix} I_r&0\\ 0&0 \end{pmatrix}. \end{align}

Slightly more explicitly, we can describe the procedure as follows. For example, let $\{u_1,\dots, u_{n-r}\}$ be a basis for $\ker T$, and extend this to a basis $\beta=\{v_1,\dots, v_r,u_1,\dots, u_{n-r}\}$ for $V$. Then, $\{T(v_1),\dots, T(v_r)\}$ is a basis of $\text{image}(T)$, so we can extend this to a basis $\gamma=\{T(v_1),\dots, T(v_r),w_1,\dots, w_{m-r}\}$ of $W$. Then, the matrix representation of $T$ relative to these bases is \begin{align} [T]_{\beta}^{\gamma}&= \begin{pmatrix} I_r&0\\ 0&0 \end{pmatrix}. \end{align} An equivalent way of saying this is that if we let $\Phi_{\beta}:V\to\Bbb{F}^n$ denotes the isomorphism induced by the basis $\beta$ (i.e $\Phi_{\beta}(v_1)=e_1,\dots, \Phi_{\beta}(v_r)=e_r,\Phi_{\beta}(u_1)=e_{r+1},\dots\Phi_{\beta}(u_{n-r})=e_{r+(n-r)}=e_n$), and likewise let $\Phi_{\gamma}:W\to\Bbb{F}^m$ the corresponding isomorphism, then $\Phi_{\gamma}\circ T\circ\Phi_{\beta}^{-1}:\Bbb{F}^n\to\Bbb{F}^m$ is the basic rank-$r$ transformation \begin{align} (\Phi_{\gamma}\circ T\circ\Phi_{\beta}^{-1})(x_1,\dots, x_n)&=(x_1,\dots, x_r,0,\dots, 0), \end{align} or more succinctly, it's the map $(\xi,\eta)\mapsto (\xi,0)$ from $\Bbb{F}^r\times\Bbb{F}^{n-r}\to\Bbb{F}^r\times\Bbb{F}^{m-r}$. In words, given a rank $r$ linear transformation $T$, we can find a linear change of coordinates (i.e choosing axes in your vector space correctly) on the domain and target (the isomorphisms $\Phi_{\beta},\Phi_{\gamma}$ respectively) such that after changing coordinates, we get the standard rank-$r$ linear map.

A more abstract way (but equivalent) of phrasing and proving the above result is as follows. Choose complementary subspaces $K',R'$ such that \begin{align} V&=K'\oplus \ker T,\quad\text{and}\quad W=\text{image}(T)\oplus R'. \end{align} Then, the linear map $T:V\to W$ gives rise to a linear map $\tilde{T}:K'\times \ker T\to \text{image}(T)\oplus R'$ (this always true for direct sum decompositions), such that \begin{align} \tilde{T}(x,y)= (T(x), 0).\tag{$*$} \end{align} (this specific formula is true because the direct summands involve the kernel and image of $T$). At this stage, if one wants to, we can compose with appropriate isomorphisms on the domain and target, to obtain the linear map $(\xi,\eta)\mapsto (\xi,0)$ as above. So, the role played by the kernel and range of a linear map is highlighted in the simple 'block decomposition' of $T$ in $(*)$.

In Rudin's formulation, he introduces $Y_1=\text{image}(DF_a)$, and $P$ is a projection with range $Y_1$, so in his case, the kernel of $P$ is like $R'$, the complement to $Y_1$ (i.e he's fixing a direct sum decomposition $\Bbb{R}^m=Y_1\oplus Y_2=\text{image}(DF_a)\oplus \ker P=\text{image}(P)\oplus \ker P$). This can seem very confusing and abstract unless you're already very comfortable with linear algebra.


Stage 2. Constant Rank Theorem.

Let $E\subset\Bbb{R}^n$ be open and $f:E\to\Bbb{R}^m$ a $C^1$ map of constant rank $r$. The goal is to describe a 'local normal form' for the map $f$. Rather than stating the conclusion of the theorem, let us actually just work out the proof, at each stage I'll try to explain the rationale for what we're doing.

Affine changes of coordinates.

  • We may, without loss of generality, assume that $0\in E$ and that $f(0)=0$ (i.e we consider $\tau_2\circ f\circ \tau_1^{-1}$ where $\tau_1,\tau_2$ are translations in the domain and target respectively).

  • By what I discussed in stage 1, we can find linear isomorphisms $L_1,L_2$ on the domain and target respectively such that the rank $r$ transformation $Df_0$ becomes $L_2\circ Df_0\circ L_1^{-1}: (x,y)\mapsto (x,0)$ from $\Bbb{R}^r\times\Bbb{R}^{n-r}\to\Bbb{R}^r\times\Bbb{R}^{m-r}$. Now, by the chain rule, $L_2\circ Df_0\circ L_1^{-1}=D(L_2\circ f\circ L_1^{-1})_0$ (because $L_1,L_2$ are linear so they are their own derivatives).

Putting these remarks together we may assume, without loss of generality (by composing on the domain and target of $f$ by appropriate affine transformations), that $0\in E$, and $f(0)=0$, and that $Df_0:\Bbb{R}^r\times\Bbb{R}^{n-r}\to\Bbb{R}^r\times\Bbb{R}^{m-r}$ is the linear transformation $Df_0((x,y))=(x,0)$. These two steps should be pretty natural: translate everything for convenience, and linearly arrange things to look nice at one point. Such reductions should be the first thing to do in any complicated proof.

Straightening out the Domain.

Now, we write $\Bbb{R}^n=\Bbb{R}^r\times\Bbb{R}^{n-r}$ and $\Bbb{R}^r\times\Bbb{R}^{m-r}$; and we 'decompose' the map as $f(x,y)= (f_1(x,y), f_2(x,y))$ for $(x,y)\in E$.

Now, the problem is to simplify the way the map $f$ looks. To do this, we introduce the map $\Phi:E\to\Bbb{R}^n=\Bbb{R}^r\times\Bbb{R}^{n-r}$ defined as $\Phi(x,y)=(f_1(x,y),y)$ (notice $\Phi(0,0)=(0,0)$). Let us calculate its derivative at the origin: from the specific structure, we have \begin{align} \Phi'(0,0)= \begin{pmatrix} \frac{\partial f_1}{\partial x}(0,0)& \frac{\partial f_1}{\partial y}(0,0)\\ 0 & I_{n-r} \end{pmatrix}= \begin{pmatrix} I_r&\frac{\partial f_1}{\partial y}(0,0)\\ 0&I_{n-r} \end{pmatrix}. \end{align} In particular, $\Phi'(0,0)$ is invertible, so by the inverse function theorem, $\Phi$ is a local diffeomorphism around the origin. I do not feel like introducing extra letters for smaller open sets, so let us just agree to replace $E$ by a smaller connected open set so that $\Phi:E\to\Phi(E)$ is a diffeomorphism (the reason for connectedness will be clear shortly).

Now, observe that the map $f\circ\Phi^{-1}:\Phi(E)\to\Bbb{R}^r\times\Bbb{R}^{m-r}$ is given by \begin{align} (f\circ\Phi^{-1})(\xi,\eta)&=\bigg((f_1\circ\Phi^{-1})(\xi,\eta), (f_2\circ\Phi^{-1})(\xi,\eta)\bigg)=\bigg(\xi,(f_2\circ\Phi^{-1})(\xi,\eta)\bigg). \end{align} The last equality is because $f_1\circ\Phi^{-1}=\pi_1\circ\Phi\circ\Phi^{-1}=\pi_1$, where $\pi_1:\Bbb{R}^r\times\Bbb{R}^{n-r}\to\Bbb{R}^r$ is the standard projection. A-priori, this shows $\Phi$ has partially 'straightened out' the domain of $f$ (picture at the end).

So far, we haven't used the fact that $f$ (and hence $f\circ \Phi^{-1}$) has constant rank. We shall do so now: the derivative of $f\circ \Phi^{-1}$ has the following block structure: for all $(\xi,\eta)\in\Phi(E)$, \begin{align} (f\circ\Phi^{-1})'(\xi,\eta)&= \begin{pmatrix} I_r& 0\\ * & \frac{\partial(f_2\circ\Phi^{-1})}{\partial \eta}(\xi,\eta) \end{pmatrix}, \end{align} where $*$ is actually equal to $\frac{\partial(f_2\circ\Phi^{-1})}{\partial \xi}(\xi,\eta)$, but this is irrelevant for us. Notice that because of the identity block $I_r$ in the top left, the rank of this matrix is $\geq r$. But, by hypothesis, the rank is equal to $r$, so this can happen if and only if the bottom right block vanishes for all $(\xi,\eta)$. But then this implies $f_2\circ\Phi^{-1}$ doesn't depend on $\eta$: $(f_2\circ\Phi^{-1})(\xi,\eta)=(f_2\circ\Phi^{-1})(\xi,0)$ for all $(\xi,\eta)\in \Phi(E)$ (it is in this implication that we use connectedness; if a derivative vanishes identically, then locally the function doesn't on those variables, so on a connected set, we can establish this everywhere. If you want, you can shrink the open set $E$ even further so that $\Phi(E)$ is an open rectangle, and then this implication should be even more obvious).

As a result of this, we have that \begin{align} (f\circ\Phi^{-1})(\xi,\eta)&=\bigg(\xi,(f_2\circ\Phi^{-1})(\xi,0)\bigg).\tag{$\ddot{\smile}$} \end{align} Notice in particular the RHS doesn't depend on $\eta$. Now, we can truly say $\Phi$ has straightened out the domain of $f$ in the following sense:

domain straightening

So, in the picture, the image of $f$ is given by the graph of the function $\xi\mapsto (f_2\circ\Phi^{-1})(\xi,0)$. Each point on the graph has a certain preimage under $f$, which in the set $E$ is given by some squiggly curve, but if you look at the preimage under $f\circ\Phi^{-1}$, it is straight lines. So, $f\circ\Phi^{-1}$ is constant on each of these straight vertical lines.


Relating to Rudin's Statement

Hopefully now it's clear what $U,V$ are: they're just small open sets which are the domain of a certain diffeomorphism (change of coordinates). In the above presentation, I have (by slight abuse of notation) referred to them as $E$ and $\Phi(E)$. Rudin's $H$ corresponds to my $\Phi^{-1}$. The point is that $\Phi$ straightens out the domain as above. Now, in my formulation, we can write the equality $(\ddot{\smile})$ as \begin{align} (f\circ\Phi^{-1})(\xi,\eta)&=(\xi,0)+(0,(f_2\circ\Phi^{-1})(\xi,0))\tag{$**$}; \end{align} this sum is the analogue of what is in Rudin. It's just that Rudin didn't perform the reductions by affine transformations initially which is why he has to carry around the null space, range and projections everywhere in the formulas.

But hopefully the formula as stated in $(**)$ is clear geometrically: a point $(\xi,(f_2\circ\Phi^{-1})(\xi,0))$ on the graph ($F(H(x))$ in Rudin's notation) is specified by its projection $(\xi,0)$ to the Horizontal axis ($Ax$ in Rudin's notation), and its projection to the vertical axis, $(0,(f_2\circ\Phi^{-1})(\xi,0))$ which is the amount by which we have to move 'above' the base point $(\xi,0)$ (this extra sum is denoted $\phi(Ax)$ in Rudin's notation).


Straightening out the Target as well.

In Rudin's formulation, he stops above, after straightening out the domain (but he lugs around the original $f$, without having performed the affine changes of coordinates). In the usual formulation of the rank theorem, we go one step further: we flatten out the target space. This is a very obvious step. How can we perform a diffeomorphism on an open subset of $\Bbb{R}^m$ so as to flatten out the graph of a function? We simple take $\Psi(\xi,\zeta):=(\xi,\zeta-(f_2\circ\Phi^{-1})(\xi,0))$. This is obviously a diffeomorphism (the inverse has the same formula except you replace $-$ with $+$).

Then, $(\Psi\circ f\circ \Phi^{-1})(\xi,\eta)=(\xi,0)$, so after non-linear changes of coordinates on the domain and target, $f$ equals the standard rank-$r$ linear map.


Summary of Proof of Constant Rank Theorem:

  • Translate things to the origin, and compose with linear isomorphisms to ensure the derivative $Df_0$ is the simple rank $r$ map. If you really wanted to, you'd have to write $L_2\circ\tau_2\circ f\circ\tau_1^{-1}\circ L_1^{-1}$ for this new map. Clearly, this is cumbersome, so I abbreviate all of this to just $f$.
  • Introduce $\Phi$ to straighten out the domain; this is a local diffeomorphism by the inverse function theorem. The constant rank of $f$ implies that of $f\circ\Phi^{-1}$, and hence $(f\circ\Phi^{-1})(\xi,\eta)$ doesn't depend on $\eta$. Now, the domain is straightened out, and the image is the graph of a certain function. Rudin stops here.
  • Finally, the usual formulation goes ahead and straightens out the graph in the target space simply by subtracting the height of the graph. This gives us a diffeomorphism $\Psi$. Then, $\Psi\circ f\circ\Phi^{-1}$ has the desired form.
Related Question