The Meaning of the Implicit Function Theorem

implicit functionimplicit-differentiationimplicit-function-theoremmatrices

Let me first say I've been self-studying the implicit function theorem in the last few weeks, but also that my knowledge in linear algebra is still poor. Until now, I've seen the generalizations of the theorem, several examples, 'til the Jacobian.

Now, I've come across this interesting pdf, in which it is stated: "If a function $F$ is vector-valued, a…

enter image description here

If a function $F$ is vector-valued, a regular point is one where the total derivative (matrix) has linearly independent rows.

Why? Probably I'm not visualizing the concrete situation, but I cannot figure out in my mind what's going on… Ok, let's say a vector-valued function defines a system of two scalar functions whose outputs together form a vector itself (?). The gradients of these two functions are linearly independent. But what does that "physically" mean, and especially how does that relate to the implicit function theorem? I mean, why if those gradients are independent and the other conditions of the theorem are satisfied, then it is possible to write a variable as a function of the others?

I derivated this result considering a system of two three-variable functions $F$ and $G$ and found that it is possible to write locally $(x,y,z)$ as $(x,α(x),β(x))$ if the determinant of the 2 by 2 matrix of the partial derivatives of $F$ and $G$ with respect to $y$ and $z$ is non zero. Which means that the two gradients are linearly independent.

In fact, once we obtain

enter image description here

we ask that this

enter image description here

must be different from zero, but this is in fact a condition on the determinant:

enter image description here

But how? What is the extraordinary link between the theroetical and calculus-based conclusion of before and the determinant of the Jacobian? I feel I'm missing this point.

Best Answer

Remember the case of the real function of one real variable: $y=f(x)$. At a point $x_0$ where the function is differentiable, you have $f(x)=f(x_0)+f'(x_0)(x-x_0)+\text{ error term}$, where the error term is $o(x-x_0)$ when $x\to x_0$. This means that, close to $x_0$, the function is well-approximated by a linear function $f(x_0)+f'(x_0)(x-x_0)$.

For many variables, the situation is the same: if the function is differentiable at a point, it means that it can be closely approximated by a linear function. For example: let $F:X\to Y$ where $x\subseteq\mathbb R^n$ and $Y\subseteq\mathbb R^m$ ("$m$ functions of $n$ variables"), and let us write $F(x_1,\ldots,x_n)=(F_1(x_1,\ldots,x_n),\ldots,F_m(x_1,\ldots,x_n))$. If we assume that, at a point $(X_1,\ldots,X_n)\in X$ this function is differentiable, this means that:

$$F(x_1,\ldots,x_n)=F(X_1,\ldots,X_n)+dF_{(X_1,\ldots,X_n)}\left[x_1-X_1,\ldots,x_n-X_n\right]+\text{ error term}$$

where $dF$ (the differential of the function, taken at the point$ (X_1,\ldots,X_n)$) is actually a linear map, and the error term is "small" (for a suitable definition of "small") with respect to the vector $(x_1-X_1,\ldots,x_n-X_n)$. It just so happens that, in a one-function-of-one-variable, any linear map amounts to multiplying with a constant, which we call the derivative of the function, while here the derivative is more complicated but has the same nature.

Then, you learn later on the term "tangent surface": at the point $(X_1,\ldots,X_n)$, if you forget about the error term, you get a linear function that approximates well the original function near that point:

$$F(X_1,\ldots,X_n)+dF_{(X_1,\ldots,X_n)}\left[x_1-X_1,\ldots,x_n-X_n\right]$$

The image of this map (the "tangent surface") is an $n$-dimensional flat surface in $\mathbb R^m$ (hyperplane) which goes through $F(X_1,\ldots,X_n)$ just like the original function $F$, and is "close" to it in the neighbourhood of $(X_1,\ldots,X_n)$.

Also, you learn that, in the coordinates given, this map $dF$ has a matrix consisting of partial derivatives of the functions $F_1,\ldots,F_m$. Thus, any determinants of the square submatrices of that matrix are actually Jacobians.

The bigger point here is that you can use the machinery of the linear algebra to study the behaviour of the function $F$ near the chosen point. For example, when can you "invert" a linear map? You know that it depends on the rank of that linear map, and in particular, if the rank of it is $n$, then one of the $n\times n$ submatrices of the matrix of the linear map is nonzero. That immediately lets you invert the above linear map - for every choice of the other $m-n$ variables, you can solve for those $n$ variables coresponding to the (linearly independent) columns.

In effect: [1] you've replaced $F$ with its linear approximation, and [2] you know how to invert that approximation. The essence of the implicit function theorem is that then: you can then invert the original function $F$ - as long as you are close enough to the point $(X_1,\ldots,X_n)$ where you are running your analysis.

Related Question