I am supposed to express the second Fréchet-Derivative of a function $f:\mathbb{R}^n \rightarrow \mathbb{R}^m$ by its partial derivatives. I know how to do this for the first Fréchet-derivative which is basically just the Jacobian matrix but for the second one?- I have honestly no idea.
[Math] How to write down the second Fréchet-derivative
analysiscalculusderivativespartial derivativereal-analysis
Related Solutions
The Jacobian matrix is the best linear approximation to $f$ at a particular point. However, if you change the point, you get a different Jacobian. The second derivative quantifies how the Jacobian changes as the point of approximation changes - the "change of the change".
To that end, we can think of the derivative $D$ as a mapping from the domain of the original function to the space of linear maps $\mathcal{L}(X,Y)$ with domain $X$ and range $Y$ the same as the original function: \begin{align} f:& X \rightarrow Y \\ Df:& X \rightarrow \mathcal{L}(X,Y) \end{align} The derivative of $f$, $Df$, is a function where you put in a point and it gives you a linear function, $$Df(x_0) = \text{best linear function approximating $f$ near }x_0.$$
In matrix form, $Df(x_0)$ is the Jacobian matrix $J$ at $x_0$: $Df(x_0)(y) = J|_{x_0} y$.
Since $Df$ is itself a function, we can take it's derivative, and so on, getting a tower of higher and higher derivatives as follows:
\begin{align} f:&\mathbb{R}^n \rightarrow \mathbb{R}^m \\ Df:&\mathbb{R}^n \rightarrow \mathcal{L}(\mathbb{R}^n, \mathbb{R}^m) \\ D(Df) = D^2f:&\mathbb{R}^n \rightarrow \mathcal{L}(\mathbb{R}^n, \mathcal{L}(\mathbb{R}^n, \mathbb{R}^m)) \\ D(D^2f) = D^3f:&\mathbb{R}^n \rightarrow \mathcal{L}(\mathbb{R}^n,\mathcal{L}(\mathbb{R}^n, \mathcal{L}(\mathbb{R}^n, \mathbb{R}^m))) \\ \dots \end{align}
Now this gets confusing fast (spaces of linear maps mapping to spaces of linear maps mapping to... ack!!). Luckily there is an isometric isomorphism theorem saying that everything just boils down to multilinear maps: $$\mathcal{L}^n(X,\mathcal{L}^m(X,Y)) \cong \mathcal{L}^{n+m}(X,Y),$$ where $\mathcal{L}^k(X,Y)$ is the space of $k$-linear maps from $X$ to $Y$, and $\cong$ denotes an isometric isomorphism of function spaces. In more detail, what it means for $g$ to be in $\mathcal{L}^k(X,Y)$ is that $g : X \times \dots \times X \rightarrow Y$, and $g$ is independently linear in each of it's entries: $$g(x_a + x_b,z,w) = g(x_a,z,w) + g(x_b,z,w),$$ $$g(x,y_a+y_b,w) = g(x,y_a,w) + g(x,y_b,w),$$ and so on.
So, now we can simplify our tower of derivatives using spaces of multilinear functions: \begin{align} f:&\mathbb{R}^n \rightarrow \mathbb{R}^m \\ Df:&\mathbb{R}^n \rightarrow \mathcal{L}(\mathbb{R}^n, \mathbb{R}^m) \\ D^2f:&\mathbb{R}^n \rightarrow \mathcal{L}^2(\mathbb{R}^n, \mathbb{R}^m) \\ D^3f:&\mathbb{R}^n \rightarrow \mathcal{L}^3(\mathbb{R}^n, \mathbb{R}^m) \\ \dots \end{align}
So, from this picture it is pretty clear what the second derivative of your function $f:X \rightarrow Y$ is at a point. It is a bilinear map from $X \times X$ to $Y$. You put in two vectors from $X$, and it gives out a vector in $Y$, and does so in a way that is linear in each input independently.
If you have a basis $\{ b_i\}$ of $n$ vectors for $X$ and basis $\{e_i\}$ of $m$ vectors for $Y$, you could completely characterize the second derivative by a 3D $n$-by-$n$-by-$m$ array of numbers $T_{ijk}$ where the $(i,j,k)$'th entry is found by applying the bilinear function with $b_i$ in the first argument and $b_j$ in the second argument, and then taking the component of the vector you get out in the $e_k$ direction: $$T_{ijk} = e_k^T D^2f(x_0)(b_i,b_j).$$
You have the first derivative, $D\det(X)(A) = (\det{X})\operatorname{tr}{(X^{-1}A)} $, so you want to find $D^2\det(X)(A,B) = D(D\det(X)(A))(B)$. You can do this using the multivariable product and chain rules, provided you know the first derivative of $X \mapsto X^{-1}$ at $X=I$. We have $$ (X^{-1}+\varepsilon K)(X+\varepsilon B) = I+\varepsilon (KX+X^{-1}B)+O(\varepsilon^2), $$ which suggests we put $ K=-X^{-1}BX^{-1} $. Then we have $$ (X+\varepsilon B)\left( (X+\varepsilon B)^{-1}-(X^{-1} - \varepsilon X^{-1}BX^{-1}) \right) = I -(I+\varepsilon BX^{-1})-\varepsilon BX^{-1}+O(\varepsilon^2) = O(\varepsilon^2), $$ and it follows that $DX^{-1}(B)=-X^{-1}BX^{-1}$.
Then the product rule gives $$ D((\det{X})\operatorname{tr}{(X^{-1}A)})(B) = D(\det{X})(B) \operatorname{tr}{(X^{-1}A)} + (\det{X})D(\operatorname{tr}{(X^{-1}A)})(B), $$ and trace is linear so the derivative passes inside and we find $$ D((\det{X})\operatorname{tr}{(X^{-1}A)})(B) = (\det{X}) \big( \operatorname{tr}{(X^{-1}B)} \operatorname{tr}{(X^{-1}A)} - \operatorname{tr}{(D(X^{-1})(B)A)}) \big) \\ = (\det{X}) \big( \operatorname{tr}{(X^{-1}B)} \operatorname{tr}{(X^{-1}A)} - \operatorname{tr}{(X^{-1}BX^{-1}A)}) \big). $$ This is symmetric because the trace is cyclic.
Best Answer
It depends on what you will do with the derivatives after writing them down. Different kinds of computations call for different notational choices. One possibility is to write $$D^2f = D^2 \begin{pmatrix}f_1 \\ f_2 \\ \vdots \\ f_m\end{pmatrix} =\begin{pmatrix}H_1 \\ H_2 \\ \vdots \\ H_m\end{pmatrix}$$ where $$H_k= \begin{pmatrix}\frac{\partial^2 f_k}{\partial x_i \partial x_j}\end{pmatrix}$$ is the Hessian of $f_k$. I.e., a column of matrices.