[Math] Need help understanding the concept of the Jacobian Matrix and its relation to differentiation

calculusreal-analysisvector-spaces

As the question suggests, I need help understanding the concepts around the following differentiation of vector-valued functions:

1) The Norm. I understand that Norm to be defined as follows:

In the metric space defined on $\Bbb R^n$ to each vector $x = (x_1, x_2, … , x_n)$ one can associate the nonnegative number $$\|x\| = d(x,0)$$
what on earth does this mean?

2) The Jacobian Matrix and its use in the definitions of differentiability and continuity. I understand the Jacobian matrix to be the gradient matrix i.e. a matrix whose elements are the partial derivatives of a function $f: \Bbb R^n \to \Bbb R^m$ at a point $(a,b) \in \Bbb R^n$.

Now then a function is said to be differentiable if
$$ \lim_{h \to 0} \frac{\|f(u+h) – f(u) – Ah\|}{\|h\|} = 0$$ where $u$ and $h$ are vectors in $\Bbb R^n$ and –this is where I'm perplexed — $Ah = J_f(u)h \in \Bbb R^m$

What are we saying here? That $Ah$ is the derivative of said function? And are we saying that essentially a vector valued function is differentiated into a vector of its partial derivatives?

Best Answer

Don't worry, multidimensional analysis is a shock to the system the first time that you see it.


1) The norm is the distance to the origin $d(x,0)$, assumed to be Euclidean, so $$\lVert x \rVert = \sqrt{\sum x_i^2}$$


2) Okay, let's get a grip on this. $$f:\mathbb{R}^n \to \mathbb{R}^m$$ so we have functions $f_1(x_1,\cdots,x_n), \ldots, f_m(\cdots)$. There are really $m$ unrelated functions $f_i:\mathbb{R}^n \to \mathbb{R}$

Great; now, what is a derivative for a function like $f_i$? It's the gradient $\nabla f_i$. Hence the whole lot of derivative information is encoded in a $m\times n$ matrix $A=J_f(u)$ with components $$A_{ij} = \partial_j f_i$$

Right! But now we want a formal definition of differentiability. How do we do it for a single $f_i=g$? We say $\nabla g$ is the derivative if it tells us what the *directional * derivatives are. How does it do this?

We want $g(x+dx) = g(x) + dx \cdot \nabla g$ - small changes get dotted with the derivative. So formally, we want a vector $v$ such that $$\lim_{h\to 0} |g(x+h) - g(x) - h \cdot v | / \lVert h \rVert = 0$$ so that the error is smaller than $h$.

But now for all the $f_i$ we get a different vector $v_i$. Putting these all together into $A$ and choosing the denominator to bound the error for any one $f_i$ gives the result you have!

Edit: To summarize, $h$ is a small change in the coordinates, $Ah$ is the directional derivative of all the separate $f_i$s in this direction, and $A$ is the 'gradient' containing all derivative information.