[Math] Understanding higher derivatives as multilinear mappings

differential-geometrymultilinear-algebramultivariable-calculus

I'm trying to understand how to relate the higher derivatives to multilinear mappings.

Let $f$ be a differentiable function. Then, since we have $Df:V\subset \mathbb{R}^n\rightarrow \text{Lin}(\mathbb{R}^n,\mathbb{R}^p) $, can I say that $Df\in \text{Lin}(\mathbb{R}^n,\text{Lin}(\mathbb{R}^n,\mathbb{R}^p))$? Also

I'm trying to relate this new way – for me at least – of thinking of higher order derivatives with what I already know, for example calculating the hessian matrix by taking the usual partial derivatives.
The book I'm using has the following theorem to allow me to compute the derivatives of multilinear mappings.

enter image description here

So, if I can think of $Df$ as in $\text{Lin}(\mathbb{R}^n\times\mathbb{R}^n,\mathbb{R}^p)$, then by the above theorem, we have $D(Df)(a_1,a_2)(h_1,h_2)=Df(h_1)(a_2)+Df(a_2)(h_2)$. However, I'm not seeing how this relates to the usual simpler calculation of the partial derivatives

Any help would be appreciated.

Best Answer

If $f \colon V \rightarrow \mathbb{R}^p$ then $Df \colon V \rightarrow \operatorname{Lin}(\mathbb{R}^n,\mathbb{R}^p)$ and so $D^2f \colon V \rightarrow \operatorname{Lin}(\mathbb{R}^n, \operatorname{Lin}(\mathbb{R}^n, \mathbb{R}^p))$. Let us try to unravel what this means.

First, note that a linear map $T \colon \mathbb{R}^n \rightarrow \operatorname{Lin}(\mathbb{R}^n, \mathbb{R}^p)$ is the same thing as a bilinear map $S \colon \mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R}^p$. More precisely, we can define a map $\varphi \colon \operatorname{Lin}(\mathbb{R}^n, \operatorname{Lin}(\mathbb{R}^n, \mathbb{R}^p)) \rightarrow \operatorname{Lin}^2(\mathbb{R}^n, \mathbb{R}^p)$ by setting $\varphi(T)(v,w) := T(v)(w)$ and this map is an isomorphism. More generally, one can construct a similar identification

$$ \underbrace{\operatorname{Lin}(\mathbb{R}^n, \operatorname{Lin}(\mathbb{R}^n, \dots \, (\operatorname{Lin}(\mathbb{R}^n, \mathbb{R}^p) \, \dots )}_{k \text{ times}} \approx \operatorname{Lin}^k(\mathbb{R}^n, \mathbb{R}^p) $$

which allows you to identify the $k$-th derivative $D^kf|_{q}$ at a point $q$ (the underline notation is useful to differentiate the point and the vector parameters and to reduce the cluttering of paranthesis) with a $k$-multilinear map.

Now, consider the case where $p = 1$ and so $f$ is a scalar function. The first derivative $Df \colon V \rightarrow \operatorname{Lin}(\mathbb{R}^n,\mathbb{R}) = \left( \mathbb{R}^{n} \right)^{*}$ sends each point $q \in V$ to a linear functional $(Df)(q) = Df|_{q}$ (the underline notation keeps track at which point we are working with and reduces clutter of parenthesis) which acts as a directional derivative:

$$ (Df|_{q})(v) = \lim_{t \to 0} \frac{f(q + tv) - f(q)}{t}. $$

In particular, if we take $v = e_i$ (where $(e_1,\dots,e_n)$ is the standard basis vector of $\mathbb{R}^n$ we get $(Df|_{q})(e_i) = \frac{\partial f}{\partial x^i}(q) = \frac{\partial f}{\partial x^i}|_{q}$. Thus, if we represent each $Df|_{q}$ by a (row) vector $(Df|_{q}(e_1), \dots, Df|_{q}(e_n))$, we have $Df "=" \nabla f$ and we recover the usual notion of a gradient of a function.

Let us move to the second derivative. By the identification above, we can think of $D(Df)(q) = D(Df)|_{q} = D^2f|_{q}$ (the second derivative at a point $q \in V$) as a bilinear map $\varphi(D^2f|_{q}) \colon \mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R}$ which is usually also denoted by $D^2f$ (making the identification above invisible) and is simply a bilinear form on $\mathbb{R}^n$. Any bilinear form is completely determined by the matrix representing it with respect to some basis so let us consider the matrix $A_{ij} = \varphi(D^2f|_{q})(e_i, e_j)$ where $(e_i)$ is the standard basis of $\mathbb{R}^n$. I claim that $A = \operatorname{Hess}(f)|_{q}$. To verify it, we unravel all the relevant definitions and properties of the derivative:

$$ A_{ij} = \varphi(D^2f|_{q})(e_i,e_j) = ((D(Df)|_{q})(e_i))(e_j) = \left( \lim_{t \to 0} \frac{Df|_{q + te_i} - Df|_{q}}{t} \right)(e_j) = \lim_{t \to 0} \frac{Df|_{q + te_i}(e_j) - Df|_{q}(e_j)}{t} = \lim_{t \to 0} \frac{\frac{\partial f}{\partial x^j}(q + te_i) - \frac{\partial f}{\partial x^j}(q)}{t} = \frac{\partial^2 f}{\partial x^i x^j}(q).$$

More generally, you should think of $(D^k f)|_{q}(v_1, \dots, v_k)$ as taking first the directional derivative of $f$ with respect to the direction $v_1$, then taking the directional derivative of the result with respect to $v_2$, etc and finally evaluating the result at the point $q$.


Regarding the theorem you quote, let me demonstrate it in the case $k = 2$. Thus, we consider a function $f \colon \mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R}^p$ which is bilinear and want to understand the derivative. For example, if $n = 2$ and $p = 1$ we can consider

$$ f((x,y),(u,v)) = 2xu + 4 xv + 5yu + 6yv. $$

The derivative should be a map $Df \colon \mathbb{R}^n \times \mathbb{R}^n \rightarrow \operatorname{Lin}(\mathbb{R}^n \times \mathbb{R}^n, \mathbb{R}^p)$ and we have

$$ Df|_{(q_1,q_2)}(v_1,v_2) = f(q_1,v_2) + f(v_1,q_2). $$

How does this work for our function $f$? For example,

$$ Df|_{(x_0,y_0),(u_0,v_0)}((1,0),(0,0)) = \frac{\partial f}{\partial x}\big|_{(x_0,y_0),(u_0,v_0)} = (2u)|_{(x_0,y_0),(u_0,v_0)} = 2u_0 \\ = f((x_0,y_0),(0,0)) + f((1,0),(u_0,v_0)).$$

Related Question